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RESEARCH  OVERVIEW 


This  report  eoTtrt  the  period  froa  April  1,  1985  through  Septeaber  30, 
1985.  The  reeeereh  discussed  here  is  described  ia  aore  detail  ia  several 
published  sad  aapablished  reports  cite^belov. 

The  CAD  frsae  Seheas  has  progressed  to  the  poiat  where  it  is  useful  for 
aaple  chip  desigas.  The  iaterface  to  CIF  is  coaplete,  and  work  has  begua  oa 
importing  layout  libraries.  An  interface  to  ED IF  is  being  installed.  Sian- 
lators  can  aow  be  connected,  and  thought  is  going  into  organization  of  VLSI 
libraries.  A  plea  for  the  distribution  of  Scheaa  is  now  being  worked  out. 
Meabers  of  the  DARPA  VLSI  coaauaity  will  be  able  to  get  copies  in  the  Fall  of 
1985  or  Spring  of  1986. 

Previous  results  on  wavefora  bounding  have  been  generalized  to  large 
classes  of  probleas  described  in  canonical  control-theory  fora.  fork  has 
begun  on  aodels  for  interconnect  taking  account  of  line  inductance.  This 
doaain  is  less  general  that  RLC  networks,  and  there  is  hope  that  soae  of  the 
previously  derived  bounds  still  apply.  Indeed,  soae  such  results  are  reported 
here. 

During  this  period  a  novel  device,  the  UV  write-enabled  PROM,  was 
reported  at  a  conference.  Work  continues  on  developing  useful  circuits 
eaploying  this  device. 

A  new  prograa  ia  CAR  (Coaputer-Aided  Reliability)  of  VLSI  circuits  has 
begua.  The  initial  eaphasis  ia  on  aodeling  three  effects,  aetal  aigration, 
tiae-dependent  dielectric  breakdown,  and  hot-electron  effects.  The  objective 
is  to  create  software  which  analyzes  layouts  for  possible  reliability  probleas 
and  gives  advice  to  the  designer. 

Work  continues  on  the  proaising  "fat-tree"  interconnection  network.  Soae 
iaproved  aes sage-delivery  algorithas  have  been  developed.  This  snd  a  variety 
of  iaproveaents  ia  routing,  arithaetic  and  graph  algorithas,  and  coaamnica- 
tions  networks  are  described  in  this  report. 
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THE  DESIGN  OF  SCHEMA 


Over  the  past  six  aonth  a  great  deal  of  effort  has  gone  into  Scheaa  — 
enoagh  that  we  have  began  to  nee  it  for  tone  aaaple  chip  deaigna.  Our 
eolleagnea  at  Harris  bare  aade  good  progress  on  the  PC  board  development 
system,  and  the  siaulatioa  environment  is  beginning  to  be  used. 

Kerry  O'Neill  spent  this  snaaer  developing  routines  for  dealing  with  CIF 
for  Scheaa.  As  a  result  Scheaa  can  now  read  and  write  its  layout  information 
in  CIF.  In  addition,  Kerry  has  begun  developing  a  library  of  laabda  based 
CMOS  and  NMOS  layouts  based  on  those  contained  in  the  Newkirk  and  Mathews 
book,  "The  VLSI  Designer's  Library."  Texas  Instruaents  has  contributed  an  " 

EDIF  parser  and  reader  for  Scheaa  which  Kerry  is  now  integrating  into  the  rest 
of  the  systea. 

Margaret  St.  Pierre  has  been  continuing  her  work  on  the  Scheaa  simulation 
environaent.  The  wavefora  database  has  been  completed  and  is  being  used  in 
several  ongoing  projects.  Margaret  has  just  finished  the  Scheaa  interface  to 
Spice,  where  Scheaa  aakes  use  of  a  reaote  Spice  server  via  the  network.  At 
the  aoaent  a  VAX  running  Unix  is  being  used  but  we  are  exploring  the  use  of  an 
IBM  43S1  as  a  faster  alternative.  Chris  Teraan  wrote  a  quick  forward  Euler 
simulator  that  has  been  incorporated.  This  simulator  is  best  used  as  an 
illustration  of  interfacing  with  Scheaa,  but  is  can  also  used  for  saall,  sim¬ 
ple  circuits  by  the  eaerging  AI  tools.  ~ 

The  PC  board  developaent  sub-systea  raised  an  interesting  problem  which 
we  would  also  have  building  VLSI  libraries:  How  to  organize  nodules  in  a 
fashion  that  would  allow  their  selection  by  their  behavior.  For  instance,  a 
designer  night  be  using  2-input  NAND  gates.  Vhen  it  coaes  tiae  to  bind  the 
logic  gate  to  a  particular  package,  the  designer  night  wish  to  know  what  TIL  ' 

packages  have  2-input  NAN)  gates  with  less  than  6  nanosecond  propagation 
delays.  Thus  a  siaple  hierarchical  structure  is  not  adequate  for  representing 
the  relationships  aaong  the  different  package  types.  To  deal  with  this  prob¬ 
lem  we  have  being  using  the  Kandor  knowledge  representation  language  developed 
at  the  Schluaberger  Palo  Alto  Research  Center  as  a  knowledge  indexed  database 
into  our  hierarchy  of  Scheaa  nodules.  Kandor  databases  are  able  to  be  compo¬ 
nents  of  a  design  hierarchy  and  thus  provide  an  alternative  indexing  aechaniaa 
for  aodules  in  Schesu.  This  structure  seeas  to  be  working  quite  well,  and  we 
are  now  evaluating  different  strategies  for  fully  integrating  it  with  Scheaa. 

A  aore  detailed  report  on  this  work  will  follow. 

Scheaa 's  usefulness  is  steadily  increasing,  and  the  nuaber  of  requeats  Z 

for  oopies  has  been  escalating.  Ve  plan  to  aake  copies  available  to  patient 
and  indulgent  colleagues  this  fall,  and  to  aake  it  aore  widely  available  in 
Spring  19t(,  after  ve  have  run  a  few  designs  through  it. 
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THE  WAVEFORM  BOUNDING  APPROACH  TO  DELAY  ESTIMATION 


¥•  have  recently  achieved  progress  on  several  fronts.  In  the  area  of 
delay  bounds  for  on-chip  interconnect,  we  have  finally  proved  a  conjectnre 
which,  at  a  very  aodest  cost  in  coapnter  tine,  enables  ns  to  further  tighten 
the  original  bounds*  on  signal  delay  in  RC  tree  networks.  Ve  have  discovered 
the  theoretical  foundations  for  a  class  of  wavefora  bounding  techniques  and 
extended  thea  to  include  a  aore  general  class  of  dynaaic  systeas.  We  have 
initiated  a  project  to  extend  the  aethods  to  bipolar  ECL  circuits  and  success- 
fully  eoapleted  the  first  part  of  it.  And  ve  have  discovered  several  practi¬ 
cal  tricks  to  speed  up  cosputation. 


We  have  begun  studying  the  aore  difficult  problea  of  delay  bounds  for 
inter-chip  interconnect  with  significant  inductance  and  have  proved  two 
enlightening  preliainary  results.  We  have  also  found  and  rigorously  justified 
a  proaising  aethod  for  "relaxing”  bounds  to  produce  tighter  ones.  More  coa- 
plete  suaaaries  are  given  in  the  individual  sections  below. 


In  the  area  of  RC  aradels  for  on-chip  interconnect  we  have  found  a  way  to 
further  tighten  the  original  bounds*  by  exploiting  the  spatial  convexity  of 
interconnect  voltage  during  transients  in  a  novel  way.  A  liaited  version  of 
this  idea  was  reported.*  Mr.  David  Standley  has  discovered  and  proved  a 
general  version  that  is  not  yet  published. 


We  had  hoped  to  find  that  the  original  work*  which  stiaulated  this 
research  does  not  rely  in  any  fundsnental  way  on  aany  of  the  special  proper¬ 
ties  of  the  RC  tree  networks  to  whioh  it  was  applied.  It  turns  out  that  its 
validity  depends  only  on  the  sign  pattern  of  the  underlying  systea  aatrix,  and 
thus  it  can  be  applied  to  a  large  class  of  dynaaic  systeas  in  other  areas  of 
science  and  engineering.* 


The  research  proposed  for  this  grant  was  initially  liaited  to  MOS  logic, 
but  there  is  also  an  urgent  need  for  fast  tiaing  analysis  techniques  suitable 
for  ECL  (eaitter-coupled  logic)  bipolar  chips.  The  project  ve  have  initiated 
to  aeet  this  need  has  three  parts:  i)  extending  the  original  work*  to  include 
networks  with  resistive  leakage  paths  to  ground  aodeling  the  base  current 

*  J.  Rubinstein,  P.  Penfield,  Jr.,  and  M.  A.  Horovitx,  "Signal  Delay  in  RC 
Tree  Networks.”  IEEE  Trans.  CAD,  vol.  2,  no.  3,  pp.  202-211,  July,  1983. 

*  Q.  Tu,  J.  L.  Wyatt,  Jr..  C.  Znkovski,  H.  N.  Tan  and  P.  O'Brien,  *Iaproved 
Bounds  on  Signal  Delay  in  Linear  RC  Models  for  MOS  Interconnect,”  Proc.  1983 
1MB  lata  Sflti  on  Circuits  and  Svsteas.  Kyoto,  Japan,  June,  1985,  pp.  903- 
906. 

*  J.  L.  Wyatt,  Jr.,  C.  A.  Znkovski,  and  P.  Penfield,  Jr.,  "Step  Response 
Bounds  for  Systeas  Described  by  M-Matrices,  with  Application  to  Tiaing 
Analysis  of  Digital  MOS  Circuits,”  to  appear  in  Proceed inas  of  the  2*th  IEEE 
Conference  on  Decision  and  Control.  Ft.  Lauderdale,  FL,  Deceaber.  1985;  also 
MIT  VLSI  Meao  No.  85-257,  Septeaber.  1985. 


pathway  ia  bipolar  logic,  ii)  finding  a  rational  and  easily  automated  Method 
for  Modeling  the  driving-point  iapedance  of  bipolar  logic  gates  by  siMple  EC 
circuits,  and  iii)  finding  a  computationally  fast  way  to  calculate  the 
required  elenents  of  the  resistance  Matrix  E  for  an  interconnect  network  with 
leakage  paths  to  ground.  Nr.  Peter  O'Brien  has  completed  the  first  of  these 
three  tasks  this  summer;  a  write-up  will  soon  be  available. 

Advances  to  save  coaputation  tiae  can  be  quite  important  in  practice. 
One  of  these  is  based  on  the  observation  that  the  electrical  models  included 
with  NOS  standard-cell  libraries  commonly  give  two  values  for  the  cell's  out¬ 
put  resistance:  one  for  output  high-to-low  transitions  and  another  for  low-to- 
high.  The  straightforward  way  of  calculating  the  time  constants  needed  in 
waveform  bounding  requires  two  entirely  separate  computations,  one  for  each 
value  of  the  driver  resistance.  But  we  have  found  a  fast  way  of  updating  the 
initial  computation  to  allow  for  a  change  in  driver  resistance  that  cuts  this 
part  of  the  computation  tiae  almost  in  half. 

In  the  area  of  linear  ELC  models  for  inter-chip  interconnect,  these 
interconnect  lines  on  printed  circuit  boards  have  much  greater  inductance  than 
on-chip  lines  due  to  the  absence  of  a  nearby  ground  plane  and,  secondarily, 
their  greater  typical  lengths.  Fast  methods  of  estimating  and  bounding  propa¬ 
gation  delay  through  such  wires  are  urgently  needed,  but  the  techniques  used 
in  the  early  research*  cannot  be  readily  applied,  primarily  because  the  step 
response  of  lines  with  inductances  exhibits  overshoot  and  ringing.  In  search¬ 
ing  for  computationally  fast  methods  to  study  delay  in  such  lines,  we  have 
made  some  initial  discoveries  that  are  interesting  in  their  own  right  and  lead 
us  to  hope  that  more  substantial  progress  will  be  possible.  Nr.  David  Stand- 
ley  has  found  that  for  any  sueh  ELC  line,  possibly  nonuniform  and  branched, 
the  first  moment  of  the  impulse  response  (perhaps  a  reasonable  estiawte  of 
delay  in  some  cases)  is  utterly  unaffected  by  the  presence  end  numerical  value 
of  all  inductors,  and  the  second  moment  (a  reasonable  measure  of  the  step 
response  rise  time  in  many  cases)  decreases  monotonically  with  the  value  of 
each  inductor. 

In  the  area  of  bound  relaxation,  we  have  discovered  a  method  by  which 
the  Waveform  Eelaxation  techniques*  developed  for  the  efficient  exact  analysis 
of  digital  NOS  circuits  can  be  extended  to  include  the  relaxation  of  bounds. 5 
In  this  project,  bounding  is  considered  as  a  framework  within  which  to  combine 
the  latest  algorithms  for  VLSI  simulation  in  a  form  where  uncerteinty  can  be 
measured  and  therefore  managed  efficiently.  The  use  of  relaxation  techniques 
allows  partitioning  to  exploit  latency  and  achieve  a  computation  time  that 
scales  roughly  linearly  with  circuit  sixe.  The  use  of  simplified  models  to 
calculate  response  bounds  allows  efficient  approximating  algorithms,  such  as 

*  E.  Lelarasmee,  A.  E.  Euehli,  and  A.  L.  Sangiovanni-Vinceutelli,  "The 
Waveform  Eelaxation  Nethod  for  Time-Domain  Analysis  of  Large  Scale  Integrated 
Circuits,”  TERR  t»...  cad.  vol.  CAD-1,  no.  3,  pp.  131-145,  July,  1982. 

*  Zukowski,  C..  J.  L.  Wyatt,  Jr.,  and  L.  A.  Glasser,  "Bounding  Techniques  and 
Applications  for  VLSI  Circuit  Simulation,"  Proc .  1985  KF-BE  IgL  gff«  EE 
Circuits  and  Systems.  Kyoto,  Japan,  June,  1985,  pp.  163-1(5. 


piecewise  linear  simulation,  to  be  exploited.  Bounding  work  for  linear 
eirenita  can  also  be  incorporated  as  one  step  in  generating  bounds  on  the 
responses  of  aore  coaplez  circuits. 

This  last  is  a  broadly  focused  and  aabitious  project.  A  nuaber  of 
exciting  results  haws  been  obtained,  but  considerably  aore  work  is  needed 
before  it  can  serwe  as  the  basis  for  a  practical  CAD  systea. 
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HIGH  PERFORMANCE  CIRCUIT  DESIGN 


Four  new  write-enabled  DV  PROM  dcii|Bi  have  been  submitted  to  MOSIS, 
three  in  mMOS  and  one  is  CMOS.  One  of  the  sMOS  designa  has  as  araa  of  only  19 
by  37  laabda.  This  design  ia  similar  to  an  sMOS  atatie  RAM.  Anotbar  design, 
snggeated  by  Prof.  Paul  Penfield,  ia  siailar  to  an  LSSD  lateb.  Should  these 
designs  prove  to  be  functional  and  robust,  they  will  be  added  to  the  MOSIS 
library. 

The  coaputer-aided  circuit  reliability  effort  ia  progressing  nicely.  The 
objective  of  this  project  is  to  build  a  circuit  siaulator  that  enables  the 
designer  to  sake  choices  aaong  circuit  configurations  on  the  basis  of  relia¬ 
bility  aetrics  such  as  the  aedian  tiae  to  failure.  We  nov  have  preliainary 
aodels  for  the  three  phenoaenon  to  be  incorporated  into  the  first  siaralator: 
aetal  aigratios,  t iae-dependent  dielectric  breakdown,  and  hot-electron 
effects.  A  reliability  siaulator,  RELIC,  is  planned  to  be  built  on  top  of  an 
existing  siaulator.  We  are  presently  evaluating  SAMSON  froa  CMU  and  RELAX 
froa  Berkeley,  for  this  purpose. 

We  have  been  developing  a  theory  which  predicts  lower  bounds  on  the  con¬ 
version  tines  of  A/D  converters  as  a  function  of  the  probability  of  a  fanlt 
caused  by  synchronizer  failures.  We  nov  have  experinental  verification  of 
this  phenoaena.  The  general  conclusion  one  reaches  as  a  result  of  this  study, 
is  that  flash  converters  do  not  have  a  significant  advantage  over  self-tiaed 
successive  approxiaation  converters  for  extreaely  high  reliability  converters. 
This  is  because  one  spends  the  vast  aajority  of  the  tiae  waiting  for  the  one 
slow  bit  (the  one  with  the  input  near  the  aetastable  point  of  the  coaparator) 
to  settle  out. 


With  the  help  of  Prof.  John  Wyatt,  detail  and  rigor  are  being  added  to 
the  work  on  reliability/noise  aargin/speed  tradeoffs  in  digital  MOS  circuits. 
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ARCHITECTURAL  DESIGN 


Ron  Greenberg  end  Professor  Leiserson  have  continued  to  improve  their 
algorithm  for  rooting  Messages  in  the  "fat-tree"  interconnection  network.  The 
new  algorithm  follows  the  sane  general  approach  of  randomizing  in  the  choice 
of  whether  or  not  to  send  a  particular  message  in  a  particular  delivery  cycle, 
hot  it  achieves  superior  running  times.  As  before,  the  performance  of  the 
algorithm  is  measured  in  terms  of  the  load  factor  X(M)  of  the  set  M  of 
messages  to  be  routed,  where  the  load  factor  can  be  defined  in  a  general  net¬ 
work  setting  as  the  maximum  over  all  cuts  in  the  network  of  the  ratio  of  the 
number  of  messages  which  must  pass  through  the  cut  divided  by  the  capacity  of 
the  cut.  For  X(M)  =  Q(lg  n  lglg  n),  the  new  algorithm  will,  with  high 
probability,  route  the  set  of  messages,  M,  in  0(X(M))  delivery  cycles, 
thereby  meeting  the  lover  bound.  Various  related  results  have  also  been 
obtained . 

Alexander  Ishii  has  developed  a  timing  scheme  for  transmitting  messages 
between  processors  in  Professor  Leiserson* s  "fat-tree”  interconnection  net¬ 
work.  In  addition,  he  and  Brace  Maggs  have  implemented  an  NMOS  VLSI  fat-tree 
network  interface  utilizing  the  scheme,  fork  on  fat-t?ee  simulations  has  tem¬ 
porarily  stopped,  in  anticipation  of  either  new  simulation  facilities  or  a 
more  optimized  simulation  program.  Simulations  with  8000  processors  have 
already  been  run,  however,  and  results  so  far  appear  encouraging.  Currently, 
he  and  Professor  Leiserson  are  investigating  the  verification  and  complexity 
of  multiphase  MOS  clocking  disciplines,  in  hopes  of  developing  a  theoretically 
sound  basis  for  optimizing  their  use. 

Cindy  Phillips  completed  her  master's  thesis  this  summer .  The  thesis 
contains  previously  completed  joint  work  with  Professor  Leiserson  on  finding 
the  connected  cosqionents  of  rectangles  in  the  plane.  It  also  presents  a  data 
structure  that  can  be  used  in  scanning-based  algorithms  which  arast  maintain 
sets  of  segments.  This  data  structure  is  much  simpler  than  previous  struc¬ 
tures  for  this  application.  The  data  structure  requires  0(n)  space  for  u 
segments.  Insertions,  deletions,  and  finding  one  segment  that  overlaps  a  test 
segment  can  all  be  performed  in  0(lg  n)  time. 

Miller  Maley  continued  work  on  the  mathematical  foundations  of  the  VLSI 
wiring  model  he  developed  with  Professor  Leiserson.  This  work  aims  to  extend 
the  applicability  of  Maley  and  Leiserson's  new  wire  routing  and  layout  compac- 
tion  algorithms. 

Tom  Cormen  and  Professor  Leiserson  have  designed  a  hyperconcentrator 
switch  for  routing  bit-serial  messages  in  highly  parallel  routing  networks. 
The  switch,  which  has  a  highly  regular  layout,  has  been  implemented  in  nMOS. 
It  takes  advantage  of  the  relatively  fast  performance  of  large  fan-in  NOR 
gates  in  MOS  technologies.  The  same  architecture  works  for  domino  CMOS  as 
well.  A  signal  incurs  exactly  2  lg  n  gate  delays  through  an  n-input  hyper¬ 
concentrator  switch.  This  switch  acts  as  a  perfect  concentrator  switch  and 
can  be  used  to  increase  the  performance  of  many  existing  routing  networks. 
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Johan  Hastad  and  Professor  Leighton  have  developed  improved  bounds  on  the 
size  of  the  saallest  known  circnits  for  division  with  O(log  N)  depth.  Pre¬ 
viously,  the  best  circnits  known  for  division  of  N-bit  nuabers  in  O(log  N) 
steps  had  size  0(N^).  The  circnit  developed  by  Hastad  and  Leighton  also 
works  in  O(log  N)  steps  bnt  needs  only  0(N^'f*)  gates,  a  significant 
iaproveaent.  The  circnit  is  not  yet  practical,  bnt  the  key  ideas  need  to 
decrease  the  size  of  the  circnit  conld  well  find  application  in  other  prob- 
leas. 

Bonnie  Berger  and  Professor  Leighton  have  developed  an  iaproved  algoritha 
for  2-layer  channel  routing.  For  two-point  nets,  the  algoritha  uses 
d+0(dl/2)  tracks  to  route  a  problea  with  density  d.  For  anltipoint  net 
probleas  with  density  d,  the  bonnd  is  2d+0(d .  These  bounds  are  a  fac¬ 
tor  of  two  better  than  the  best  previously  known  bounds  of  Rivest,  Baratz  and 
Miller.  The  new  algoritha  allows  wires  to  overlap  for  unit  segaents  in  the 
vertical  direction  (i.e.,  it  uses  the  unit-vertical-overlap  aodel).  Most 
channel  routing  probleas  with  density  d  require  d  tracks,  even  if  arbi¬ 
trary  overlap  is  allowed,  so  the  new  results  are  very  close  to  optiaal  in  a 
strong  sense. 

Thang  Bui  and  Professor  Leighton  continue  to  work  on  the  graph  bisection 
problea.  They  have  now  shown  that  greedy  algorithas  perform  well  (i.e., 
alaost  always  find  the  optiaal  bisection)  on  randoa  d-regular  graphs  having 
snail  bisection  width,  for  large  enough  degree  d.  The  work  indicates  that 
heuristics  which  use  2-change  operations  (such  as  the  well  known  Kernighan-Lin 
heuristic)  will  also  perfora  well  for  this  class  of  graphs.  It  is  hoped  that 
this  result  can  be  extended  to  classes  of  graphs  with  larger  bisection  widths 
and  saaller  degrees.  Already  the  work  has  led  to  iaproved  heuristics  for  the 
graph  bisection  problea  which  appear  to  work  very  well  on  a  wide  variety  of 
randoaly  generated  graphs. 

Professor  Goldwasser  is  continuing  her  work  on  pseudo-randoa  nuaber 
generation.  She,  together  with  Dr.  Goldreich  and  Professor  Micali,  showed  how 
to  construct  pseudo-randoa  functions  froa  k  bits  to  k  bits  which  are 
indistinguishable  froa  truly  randoa  functions  in  probabilistic  polynomial  (in 
k)  tiae,  based  on  the  assuaption  that  one-way  functions  exist.  Pseudo-randoa 
functions  can  be  used  for  generating  test  data  for  VLSI  circuits. 

Professor  Goldwasser,  Professor  Micali,  Benny  Chor  (who  has  just  received 
his  PhD)  and  Professor  Awerbuch  designed  a  aethod  by  which  an  n-node  seai- 
synchronous  broadcast  network  with  o(log  n)  faults  can  be  tr_=sforaed  into 
an  n-node  siaultaneous  broadcast  network  with  o(log  n)  faults.  It  is 
especially  easy  to  design  fault-tolerant  algorithas  for  the  latter  aodel. 
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ABSIRACI 

In  the  paper*  we  describe  a  polynomial  time  algorithm  that*  for  every  Input 
graph*  either  outputs  the  minimum  bisection  of  the  graph  or  halts  without 
output.  More  Importantly*  we  show  that  the  algorithm  chooses  the  former  course 
with  high  probability  for  many  natural  classes  of  graphs.  In  particular*  for 
every  fixed  d  >.  3*  all  sufficiently  large  n  and  all 

b  =  0(nl-l/floor( (d+l)/2) ) , 

the  algorithm  finds  the  minimum  bisection  for  almost  all  d-regular  labelled 
simple  graphs  with  2n  nodes  and  bisection  width  b.  For  example*  the  algorithm 
succeeds  for  almost  all  5-regular  graphs  with  2n  nodes  and  bisection  width 
o(n2/3).  The  algorithm  differs  from  other  graph  bisection  heuristics  (as  well 
as  from  many  heuristics  for  other  NP-complete  problems)  In  several  respects. 

Most  notably: 

(1)  the  algorithm  provides  exactly  the  minimum  bisection  for  almost  all  Input 
graphs  with  the  specified  form*  Instead  of  only  an  approximation  of  the 
minimum  bisection* 

(11)  whenever  the  algorithm  produces  a  bisection*  It  Is  guaranteed  to  be 

optimal  (l.e.*  the  algorithm  also  produces  a  proof  that  the  bisection  It 
outputs  Is  an  optimal  bisection)* 

(111)  the  algorithm  works  well  both  theoretically  and  experimentally* 

(1v)  the  algorithm  employs  global  methods  such  as  network  flow  Instead  of 
local  operations  such  as  2-changes*  and 
(v)  the  algorithm  works  well  for  graphs  with  small  bisections  (as  opposed  to 
graphs  with  large  bisections*  for  which  arbitrary  bisections  are  nearly 
optimal ) . 
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1  Introduction 

Massively  parallel  fine-grained  multiprocessors  [1,2]  require  massive  amounts  of  com¬ 
munication.  Interprocessor  communication  bandwidth  is  thought  to  strongly  impact  the 
performance  of  multiprocessors  by  imposing  limits  on  the  degree  to  which  processors  can 
exchange  information  and  cooperate  on  a  single  problem.  In  this  paper  we  examine  the 
physical  limits  imposed  on  interprocessor  communication  by  the  capacity  of  the  commu¬ 
nication  medium.  Other  constraints,  such  as  those  discussed  in  [3],  provide  additional 
limitation  on  the  performance  of  large  digital  multiprocessors.  In  the  spirit  of  massive  par¬ 
allelism,  we  use  continuous  rather  than  discrete  variables  in  the  formulation  of  the  model. 
This  choice  makes  the  mathematics  cleaner  without  sacrificing  the  physics. 

We  know,  from  information  theory,  that  the  transmission  of  one  bit  of  information 
requires  on  the  order  of  energy  kT ,  where  k  is  Boltzman’s  constant  and  T  is  absolute 
temperature.  Thus,  high  data  rates  require  large  quantities  of  power,  independent  of  how 
the  information  is  transmitted  or  coded.  If  we  examine  a  transmission  medium  of  fixed 
diameter,  whether  it  be  a  length  of  coaxial  cable  or  a  fiber  optic  link,  the  information 
velocity  is  limited  to  c,  the  speed  of  light.  If  the  “wire”  diameter  is  D,  then  the  energy 
density  in  the  wire  must  be  at  least  BEq/{cD2),  where  B  is  the  bit  rate  and  Eq  is  the 
minimum  energy  required  to  transmit  one  bit.  All  physical  media,  with  the  exception 
of  absolute  vacuum,  have  a  maximum  energy  density  that  they  can  withstand  without 
breaking  down.  Since  electromagnetic  energy  is  usually  the  information  carrying  medium 
of  choice,  limits  on  the  energy  density  often  arise  out  of  electric  forces  which  become 
significant  when  compared  to  the  forces  binding  electrons  to  their  atoms.  These  electric 
fields  will  cause  the  medium  to  behave  non-linearly,  distorting  the  information.  Large 
enough  fields  will  cause  destruction  of  the  medium.  One  can  counteract  this  phenomena 
by  either  making  D  larger  or  running  several  cables  in  parallel.  We  therefore  argue  that 
a  fundamental  quantity  which  is  physically  limited  is  information  density.  The  highest 
values  of  information  density  are  found  in  optical  fibers  and  VLSI  chips,  where  values  of 
1019±2b/(s-m2)  can  be  observed.  Usually  communication  densities  are  much  lower. 

While  information  densities  may  be  far  from  their  fundamental  limits  is  current  tech¬ 
nologies,  there  will  always  be  practical  limits  and  costs  associated  with  different  densities. 
Therefore,  it  is  important  to  study  the  general  relationship  between  information  densities 
and  interprocessor  communication.  Given  an  array  of  processors  and  a  communication 
medium  of  bounded  volume,  the  interprocessor  communication  requirements  necessarily 


generate  information  density  requirements  on  the  medium.  Limits  on  information  density 
therefore  impose  limits  on  the  degree  to  which  processors  can  communicate  and  hence 
cooperate.  It  is  these  limits  which  we  will  discuss  in  this  paper. 

2  Statement  of  Problem 

Assume  that  K  dimensional  space  is  filled  with  processing  elements  with  a  density  p, 
where  p  is  a  function  of  position.  Processors  in  two  dimensional  space  might  represent  an 
array  on  a  planar  integrated  circuit.  Large  systems,  consisting  of  many  VLSI  sub-systems, 
are  constructed  to  take  advantage  of  all  three  physical  dimensions.  In  this  section,  we 
present  the  machinery  necessary  to  analyze  interprocessor  communication  in  such  an  array. 
For  simplicity  we  assume  that  all  functions  discussed  are  continuous  and  bounded. 

First  consider  the  information  flowing  out  from  a  single  processing  element.  The 
communication  requirement  for  the  entire  array  is  the  sum  over  all  elements  of  these 
outgoing  information  flows.  If  all  communication  between  every  two  processors  is  assumed 
to  be  through  the  shortest  path,  the  flux  density  of  bandwidth  from  a  single  processor  is 
a  radial  vector  field.  That  is,  each  processor  emits,  along  a  radial  line,  that  information 
required  by  other  processors  in  that  direction.  Near  the  transmitter,  the  information 
density  due  to  the  transmitter  must  contain  the  information  intended  for  all  receivers  in 
a  given  direction.  Further  away  from  the  transmitter,  the  information  flux  decreases  as 
various  receivers  remove  data.  F[q,  s)  represents  the  information  flux  density  at  point  q 
originating  from  a  processor  at  point  s.  f{q,f)  denotes  the  scalar  magnitude  of  the  flux 
and  has  the  units  b/(s-proc-mK_1).  We  have  the  simple  relation 

*1-  (‘) 

The  flux  from  a  single  processor  can  be  viewed  as  a  vector  field,  but  the  contributions 
from  different  processors  do  not  add  as  vectors.  If  two  processors  are  exchanging  infor¬ 
mation  at  equal  rates,  the  total  communication  taking  place  is  not  zero.  As  a  result,  our 
definition  assumes  that  the  flux  magnitude  is  always  positive  and  incoming  information  is 
represented  in  the  positive  outgoing  flux  from  other  processors.  The  function  f(q,  a)  must 
be  a  decreasing  function  of  the  distance  \q  -  «|  because  other  processors  can  only  receive 
information  from  the  processor  at  point  s.  In  any  dimension  larger  than  one,  the  surface 
area  available  for  communication  increases  with  distance.  Therefore,  when  non-sero,  / 
must  fall  with  distance  at  least  as  fast  as  \q  -  s]l~K ,  the  rate  seen  in  the  absence  of  receiv¬ 
ing  processors.  Figure  1  illustrates  the  relevant  geometries.  In  the  absence  of  receivers,  the 
total  information  travelling  outward  from  a  is  independent  of  distance,  but  the  information 
density  is  not. 

To  express  the  notion  of  receiving  processors  we  define  the  quantity  I(q,  s)  to  represent 
the  information  bandwidth  flowing  from  a  processor  at  point  a  to  one  at  point  q.  /  has  the 
units  of  b/(s  proc3).  Intuitively,  the  information  flowing  to  a  processor  at  q  is  the  amount 
of  flux  disappearing  there,  scaled  by  the  processor  density  at  q.  This  can  be  expressed  as 

e(4)I{q,f)  =  -VF.  (2) 
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Using  polar  coordinates  defined  about  the  point  s,  (2)  can  be  re-expressed  in  integral  form 


as 
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This  form  assumes  that  the  boundary  condition  of  zero  information  flux  at  infinity  is 
satisfied,  that  is,  all  information  sent  by  a  processor  is  received  by  others. 

To  find  the  total  communication  density  at  a  particular  point,  we  add  the  magnitudes 
of  the  information  fluxes  originating  from  all  processors.  The  total  information  flux  through 
any  point  q  is  given  by 


=  /  /(9»(*1- 
Jv 


(4) 


$  represents  the  density  of  bits  per  second  passing  through  an  infinitesimal  area  of  space 
and  has  the  units  of  bits/(s-mx_1).  Said  another  way,  $  represents  information  rate 
density.  It  is  this  density  which,  we  argued  in  the  introduction,  is  a  physically  significant 
and  limited  quantity.  Equation  (4)  can  be  simplified  by  assuming  spherical  symmetry 
about  q  —  0  and  considering  only  the  bandwidth  density  at  the  origin.  As  a  result  of 
symmetry,  the  only  necessary  coordinate  is  r,  the  distance  from  the  origin.  Equation  (4) 
then  becomes 


$(0 )  =  CK  [  {rK  l)  f(0,r)p(r)dr, 
Jo 
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where 
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for  K  =  1 
for  K  =  2 
for  K  =  3 
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[K  -  Z)Ck-\Ck-2 


( K  -  2)Ck-3 


in  general. 


Combining  (3)  and  (5),  we  obtain 


$(0)  =  Ck  J  P(r)  £  (r**-1Mr’ -r)/(r-r*,r)dr*dr. 


(6) 


These  geometries  are  illustrated  in  Fig.  2.  Due  to  spherical  symmetry,  I(-b,a)  refers  to 
the  communication  from  a  processor  at  a  distance  a  from  the  origin  to  one  a  distance  b 
from  the  origin  on  the  opposite  side.  Also  by  the  symmetry  assumption,  p(r)  =  p{~r). 

Equation  (6)  is  the  central  result  of  the  paper.  In  the  remaining  sections  we  consider 
some  important  special  cases.  First  we  look  at  the  limit  to  global  communication  imposed 
by  a  finite  communication  density  constraint.  Then  we  look  at  how  communication  density 
scales  with  processor  and  communication  distribution. 


3  Finite  Communication  Density  Constraint 

The  first  special  case  of  interest  is  that  of  p  constant  and  I(q ,  s)  a  function  only  of 
the  euclidean  distance  d  =  q  -  s 1.  That  is,  we  have  a  model  in  which  all  of  space  is 
#  filled  with  processors,  each  communicating  with  all  of  the  others.  We  assume,  however, 

that  the  emphasis  is  on  local  ( d  small)  communication.  Our  objective  is  to  quantify  the 
requirement  for  locality.  Assume  that  p  =  po  and  that  I[d)  is  a  continuous  bounded 
function  of  non-negative  d  such  that 

.  l(ii  <  +  d)-M  (7) 


for  some  positive  constants  7,  do,  and  M.  In  this  case  we  say  that  communication  falls 
with  order  M.  We  have 


*(0)  =  CKpl  r  1“ r-K-‘/(r-)dr-dr, 

(8) 

• 

and  therefore 
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$(0)  <  ^CkPq  J  J  {do  +  r*)*-1-Mdr*dr. 

(9) 

As  a  result  of  (9),  $(0)  will  converge  if  we  meet  the  constraint  that 

( 

M>K  +  1. 

(10) 
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Specifically,  in  two  dimensions  the  communication  must  fall  off  with  order  greater  than  3 
and  in  three  dimensions,  the  order  must  exceed  4.  d~ 4  is  a  very  rapidly  decaying  function 
and  indicates  the  extreme  penalty  for  long  distance  communication.  This  constraint  is 
so  severe  because  each  processor  on  one  side  of  the  origin  must  communicate  with  every 
processor  on  the  other  side  of  the  origin.  Equation  (6)  limits  the  degree  to  which  even  an 
infinite  number  of  processors  can  cooperate  on  a  single  problem. 

The  second  special  case  of  interest  is  that  of  I  constant  and  p  some  continuous  bounded 
function  of  the  distance  from  the  origin  r.  Many  multiprocessors,  such  as  the  BBN  But¬ 
terfly,  strive  to  keep  I  constant  so  that  the  programmer  does  not  have  to  deal  with  the 
added  complexity  of  communication  locality.  Assume  that  /  =  Iq.  We  have 

$(0)  =  CkJo/  p[r)  r*K~1p[r*  —  r)dr*dr,  (11) 

and  therefore 

*(0)  >  CKI0  f  p(r)  f  rK~xp{r')dr* dr.  (12) 

Jo  Jo 

Noting  that  the  total  number  of  processors  N  is  given  by 

N  =  f  p{s)  =  Ck  f  rK~xp{r)dr,  (13) 

Jv  Jo 

we  see  that 

*(0)  >  I0N  [  p{r)dr.  (14) 

Jo 

From  this  it  can  be  concluded  that,  with  uniform  communication,  a  finite  communication 
density  requires  a  finite  number  of  processors,  as  one  would  expect.  For  N  to  be  finite, 
p(r)  must  fall  off  with  an  order  larger  than  K . 


'•  i  Lviv Lv 
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4  Communication  Scaling 

Assuming  that  a  processor  array  is  finite,  equation  (6)  also  indicates  how  communi¬ 
cation  density  scales  with  bandwidth  requirements  and  array  size.  Consider  the  special 
case  where  p  is  constant  over  a  finite  array  of  radius  R.  Assume  as  before  that  I  is  only 
a  function  of  distance  d.  Let  /(d)  have  the  constant  value  Iqo~m  for  0  <  d  <  a  and  be 
Iod~M  for  d  >  a,  where  a  is  some  constant  such  that  0  <  a  <  R.  From  (6)  we  arrive  at 

pR  pr+R 

*(0  )=Ck4J  J  (r-K-‘)/(r-)dr-dr.  (15) 


Evaluating  (15),  we  find  that 

*(0  )  =  CKIQp20{(3  +  6RK-M+1)  (16) 

where 

MaK-M+t 

0  ~  [K  +  1){M  -  K  -  1)' 

2(2k~m  -  1) 

”  [M  -K){M-K-\Y 

when  M  is  not  equal  to  K  or  K  4- 1.  Consistent  with  the  results  of  the  previous  section,  if 
communication  falls  off  with  an  order  larger  than  K  -r  1,  $(0)  approaches  asymptotically 
to  the  finite  value  of  CkIoPoP  as  R  is  increased  towards  infinity.  If  M  is  smaller  than 
K  +  1,  (16)  determines  the  rate  at  which  $(0)  approaches  infinity  with  increasing  radius 
R. 

In  the  special  case  where  M  =  0  (/  a  constant  function  of  distance),  we  find  that 
0  -  0.  If  the  number  of  processors  N  is  held  constant,  we  have 

*<°> = O  n%r1~k  (17) 

or 

$(0)  a  (18) 

From  (18)  we  can  see  how  communication  density  decreases  with  spreading  of  a  processor 
array.  With  uniform  communication  among  a  fixed  number  of  processors,  the  communi¬ 
cation  density  falls  off  as  the  volume  to  the  power  (1  -  K)/K.  For  a  two  dimensional 
array  on  a  planar  integrated  circuit,  every  factor  of  four  increase  in  the  area  produces  a 
factor  of  two  reduction  in  the  communication  density.  If,  on  the  other  hand,  the  processor 
density  is  held  constant  and  the  number  of  processors  is  increased  by  a  factor  of  four,  the 
communication  density  is  increased  by  a  factor  of  eight. 


(16.2) 


(16.1) 
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5  Conclusion 

We  have  presented  a  continuous  model  for  communication  density  in  large  multipro¬ 
cessor  arrays.  Physical  systems,  regardless  of  their  architecture,  must  contend  with  the 
limits  we  described.  While  our  analysis  is  only  valid  for  straight  line  communication,  the 
average  value  of  $  can  only  increase  when  line-of-sight  communication  is  not  used. 

Analysis  of  the  model  demonstrates  the  high  costs  of  communication.  This  does  not 
mean  that  global  communication  is  completely  unwarranted.  As  pointed  out  by  Leiserson 
[4],  there  are  some  algorithms  for  which  a  little  global  communication  is  worth  a  lot  of 
local  communication.  Similarly,  we  may  also  trade  off  greater  numbers  of  processors  with 
local  communication  against  fewer  numbers  of  processors  with  more  global  communica¬ 
tion  bandwidth.  In  the  context  of  any  one  problem,  the  number  of  processors,  together 
with  the  spatial  characteristics  and  capacity  for  information  exchange,  can  be  evaluated 
under  the  integral  in  (6)  to  determine  the  communication  density  costs  of  a  particular 
implementation. 
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Figure  Captions 

Fig.  1  The  information  flux  leaving  a  point  s  is  a  decreasing  function  of  distance. 

Fig.  2  $(0)  is  the  sum  of  the  communications  from  all  processors  s  to  those  processors  q 
on  the  other  side  of  the  origin. 
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Abstract:  We  have  demonstrated  a  programmable  read  only  memory 
cell  that  is  written  in  the  presence  of  unfocused  UV  light.  Once  the  UV 
light  is  removed,  the  programmed  state  is  non-volatile.  The  cell  uses 
conventional  MOS  processing. 

We  have  demonstrated  a  programmable  read  only  memory  cell  that 
can  be  written  into  the  “1,”  “0,”  or  “previous"  state  in  the  presence  of 
unfocused  UV  light.  The  programmed  state  is  controlled  by  low  electrical 
voltages.  Once  the  UV  light  is  removed,  the  programmed  state  is  non¬ 
volatile.  The  memory  cell  uses  conventionai  MOS  processing  with  no 
additional  mask  steps.  The  cell  can  thus  be  implemented  on  virtually 
all  silicon  gate  nMOS  and  CMOS  processes.  It  was  demonstrated  in 
4  pm  nMOS  using  silicon  foundry  resources  through  the  MOSIS  facility. 
Functional  chips  were  fabricated  by  Comdial  and  AML 

The  cell,  illustrated  schematically  in  Fig.  1,  takes  advantage  of  the 
fact  that  under  UV  excitation  electrons  can  surmount  the  potential  bar¬ 
rier  at  the  silicon/silicon  dioxide  interface.  Therefore,  when  the  light 
impinges  on  the  gate/source  or  gate/drain  region  of  a  MOS  transistor, 
photo-excited  electrons  flow  through  the  gate  oxide  so  as  to  equalize  the 
quasi-Fermi  levels.  We  write  both  binary  states  rather  than  simply  erase 
(write  a  “0”  into)  a  cell,  as  had  been  done  in  previous  UV-PROMs.  In 
the  circuit  in  Fig.  1,  node  Vbit  is  implemented  entirely  in  polysilicon.  This 

*  This  research  was  supported  by  the  Defense  Advanced  Research 
Projects  Agency  under  contract  number  N00014-80-C-0622. 


Fig.  1  Schematic  of  UV  Write-enabled  PROM, 
is  the  floating  gate.  Only  the  floating  gate  area  over  the  MOS  capacitor 
M\  is  open  to  illumination.  The  rest  of  the  node  needs  to  be  covered 
by  opaque  material;  in  most  implementations  this  would  be  aluminum. 
A  realistic  layout  is  shown  in  Fig.  2.  One  must  be  careful  not  only  to 
cover  M2  and  M3  with  metal,  but  also  one  must  cover  the  poly  over  field 
region. 

All  reading  and  writing  is  accomplished  with  power  supplied  to  the 
chip.  With  nMOS  technology  this  presents  no  problem  unless  one  is 
counting  on  dynamic  charge  storage  during  the  write  pulse.  With  CMOS 
technology  one  must  beware  of  photon  induced  latch-up;  however,  this 
usually  requires  much  higher  light  intensities  than  we  are  using.  Write 
times  as  short  as  of  10  min.  were  observed.  During  this  time  all  cells  may 
be  changed  simultaneously,  if  desired. 

To  write  a  “1,”  one  raises  the  SET  line  and  floods  the  chip  with  UV 
light.  t>Wt  then  charges  up  to  a  high  voltage,  ideally  Vdd>  but  typically 
about  1.5  V  lower.  Vdd  ls  power  supply  voltage. 

Experiments  with  254  nm  and  302  nm  light  indicate  that  the  elec¬ 
trons  that  charge  and  discharge  the  gate  are  not  coming  from  the  mea- 
gcrly  populated  silicon  conduction  bands,  but  rather  from  the  valence 
bands  where  electrons  are  much  more  numerous.  The  conduction  band 
to  oxide  barrier  is  usually  taken  as  3.1  eV.  If  the  carriers  were  coming  from 
the  conduction  band  then  both  wavelengths,  corresponding  to  4.8  eV  and 
4.1  eV,  would  enable  write  action  in  the  PROM.  We  observed  only  the 
shorter  wavelength  light  to  be  effective.  This  is  consistent  with  assum- 


ing  the  carriers  come  from  the  valence  band,  in  which  case  the  barrier  is 
about  4.2  eV — too  high  for  electrons  excited  with  the  longer  wavelength 
light. 


Fig.  2  Layout  of  PROM  cell. 

To  write  a  “0”  one  raises  the  RESET  line  instead  of  the  SET  line. 
When  the  UV  excitation  is  removed,  the  data  is  stored.  To  write  the 
previous  state,  and  this  is  important  is  cases  when  one  only  wants  to 
change  a  few  bits  of  the  programming,  one  leaves  the  SET  and  RESET 
lines  low  while  power  and  illumination  are  applied.  The  two  cascaded 
NOR  gates  will  assure  that  the  last  stored  value  is  retained. 

Many  different  cells  are  possible.  The  simplest  (and  most  compact) 
cells  do  not  have  the  ability  to  hold  the  last  state  while  the  other  cells 
are  being  written;  they  can  only  be  set  or  reset.  A  cell,  such  as  the  one 
in  Fig.  2,  can  write  the  last  state,  but  the  SET  or  RESET  lines  must  be 
held  valid  for  a  significant  part  of  the  write  cycle. 

To  speed  the  writing  of  large  arrays,  one  can  write  the  data  first  into 
shadow  registers.  Cells  with  shadow  registers  can  write  the  last  state  and 
the  SET  and  RESET  lines  can  be  held  low  while  the  PROM  is  written. 
This  is  advantageous  because  it  lowers  the  number  of  I/O  pins  required 
for  a  given  write  time.  The  shadow  registers  can  be  quickly  written 
(sequentially)  and  then,  when  the  light  is  turned  on,  one  can  transfer 


their  contents,  in  parallel,  to  the  Boating  gates.  A  cell  which  does  this  is 
illustrated  in  Fig.  3.  SE  enables  the  shadow  latch. 
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Fig.  3  Schematic  of  PROM  with  shadow  memory. 


The  UV  Write-enabled  PROM  is  new  field  programmable  device  that 
allows  one  to  add  a  few  hundred  bits  of  non-volatile  storage  to  virtually 
any  silicon  gate  MOS  chip,  with  no  additional  masking  or  processing 
steps.  There  are  many  systems  applications  for  this  capability.  Typical 
applications  might  include  the  storage  of  cryptographic  codes,  special 
addresses,  calibration  data,  or  repair  locations  for  fault-tolerant  systems. 

The  long  term  reliability  of  these  devices  is  presently  under  investi¬ 
gation.  We  have  demonstrated  months  of  storage  time  at  room  temper¬ 
ature.  Since  the  storage  mechanism  is  precisely  the  same  as  that  used  in 
many  commercial  UV-PROMs,  we  anticipate  few  problems.  The  oxides 
which  blanket  the  floating  gate  are  either  thermally  grown  or  very  thick. 
Previous  work  on  floating  gate  MOS  structures  is  highly  encouraging  in 
this  regard  [1,  2]. 

Figure  4  shows  a  photomicrograph  of  the  experimental  cell.  The 
layout  was  not  optimized.  We  now  understand  that  the  grounded  metal 
line  shading  Vbit  should  have  been  attached  to  the  OUTPUT  instead  of 
ground  because  it  provides  a  source  of  photo-excited  electrons  near  the 
floating  gate,  thereby  limiting  the  gate’s  maximum  potential.  Since  the 
floating  gate  is  not  charged  to  Vdd»  a  low  trigger  voltage  on  the  SET 
NOR  gate  is  required.  This  can  be  accomplished  by  making  the  pullup 


Fig.  4  Photomicrograph  of  experimental  cell. 
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Bounding  Techniques  and  Applications 
for  VLSI  Circuit  Simulation 

Charles  A.  Zukowski,  John  L.  Wyatt,  Jr.,  and  Lance  A.  Glasser 

Abstract 

Simulation  of  large  digital  MOS  circuits  has  recently  been  made  faster  both  by  using  simplified 
device  models  and  by  using  algorithms  that  are  tailored  for  such  circuits.  A  bounding  approach  is 
presented  that  builds  upon  these  techniques  to  trade  speed  for  measured  uncertainty,  allowing 
uncertainty  to  be  managed  efficiently.  The  bounding  techniques  can  also  be  used  to  incorporate 
uncertainties  in  the  circuit  model  arising  from  variations  in  fabrication  process  and  operating 
conditions,  and  uncertainties  in  input  waveforms.  Monotonic  properties  of  the  MOS  circuit  model 
allow  rigorous  bounds  to  be  generated  through  the  analysis  of  simplified  circuit  models,  and 
waveform  relaxation  techniques  can  be  extended  to  efficiently  include  bounds  for  large  digital 
MOS  circuits.  When  applied  at  a  high  level,  bounding  algorithms  can  provide  a  useful 
enhancement  for  VLSI  circuit  simulation  programs. 

I.  Introduction 

The  size  and  complexity  of  VLSI  circuits  has  created  a  need  for  more  powerful  electrical  circuit 
simulation  tools  [1].  In  recent  years  research  towards  this  goal  has  fallen  into  two  major 
categories.  First,  since  today's  VLSI  circuits  are  primarily  digital  MOS,  simulation  algorithms  have 
been  investigated  that  exploit  the  special  properties  of  these  circuits  to  gain  efficiency.  The 
latency  of  digital  circuits  has  been  exploited  to  eliminate  unnecessary  calculations  corresponding 
to  inactive  portions  of  a  circuit.  Analysis  has  also  been  partitioned  to  achieve  a  linear  scaling  of 
computation  time  with  circuit  size.  Second,  since  high  accuracy  is  not  always  essential, 
algorithms  have  been  explored  that  trade  accuracy  for  speed.  Macromodeling  of  basic  cells,  and 
linear  and  piecewise  linear  device  models  have  been  used  to  reduce  complexity. 

Although  algorithms  tailored  to  digital  MOS  circuits  have  been  successful  in  reducing 
computation  time,  they  eventually  reach  a  limit.  Trading  accuracy  for  more  speed,  when  feasible, 
also  provides  an  attractive  tool  that  is  often  used  in  conjunction  with  digital  MOS  algorithms. 
Accuracy  can  be  sacrificed  in  many  instances  without  any  detrimental  effect,  as  in  the  analysis  of 
non-critical  logic  paths.  Unfortunately,  uncertainty  cannot  be  managed  effectively  in  standard 
approximating  simulators  because  it  is  not  measured.  Many  circuit  designers  continue  to  seek 
the  reassurance  of  an  "exact"  simulator  at  great  cost.  Rough  timing  information  can  be  very 
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useful  in  early  design  phases,  but  is  not  sufficient  for  high  performance  design.  If  a  combinational 
logic  block  were  being  designed  that  had  to  be  faster  than  10ns,  a  simulator  that  estimated  a  9ns 
delay  for  its  critical  path  would  not  be  helpful  unless  its  standard  deviation  were  only  a  few 
percent.  If  the  number  of  paths  with  a  delay  close  to  9ns  were  large,  even  more  statistical 
accuracy  would  be  required  to  achieve  high  confidence  that  the  specification  was  met.  If 
bounding  algorithms  were  used,  only  enough  computation  would  be  required  to  achieve  a  ten 
percent  level  of  accuracy  for  the  critical  paths,  regardless  of  the  number  considered. 

This  paper  presents  an  overview  of  our  recent  work  on  bounding  techniques  for  VLSI  circuit 
simulation.  Space  considerations  prevent  discussion  of  the  technical  details  here,  but  a  complete 
presentation  is  given  in  [2],  The  main  motivation  to  consider  bounding  techniques  in  VLSI  circuit 
simulation  is  to  further  improve  efficiency  through  uncertainty  management.  By  trading  measured 
accuracy  for  speed,  only  a  required  level  of  accuracy  need  be  achieved.  Fortunately,  present 
algorithms  for  both  efficient  and  approximate  simulation  of  large  digital  MOS  circuits  can  be 
extended  to  consider  efficient  bounds.  Bounding  can  be  viewed  as  a  framework  in  which  these 
tools  can  be  applied  rationally  to  a  particular  simulation  problem. 

In  addition  to  measuring  uncertainties  that  arise  from  simplified  analysis,  bounds  can  be  used  to 
incorporate  uncertainties  in  the  circuit  model  arising  from  variations  in  fabrication  process  and 
operating  conditions,  as  well  as  uncertainties  in  input  waveforms.  In  this  manner  "circuit 
simulation"  can  be  extended  into  the  realm  of  "circuit  analysis,"  in  which  an  entire  class  of 
circuits,  corresponding  to  ranges  of  device  models,  etc.,  is  considered  simultaneously.  For 
example,  bounds  can  incorporate  the  groups  of  excitations  considered  in  input-independent 
analysis. 

When  considering  general  problems,  it  is  often  difficult  to  generate  tight  bounds.  Digital  MOS 
circuits,  though,  provide  a  special  problem  in  which  the  difficulties  exhibited  in  more  general 
problems  can  be  avoided  to  a  large  extent.  The  second  section  discusses  the  feasibility  of  using 
bounding  in  VLSI  circuit  simulation  in  general  terms. 

In  addition  to  being  generally  feasible,  bounding  algorithms  for  digital  MOS  circuits  can  use 
modifications  of  existing  efficient  approximation  algorithms.  Due  to  special  monotonic  properties 
of  the  MOS  circuit  model,  simple  linear  or  piecewise  linear  circuit  models  can  be  derived  whose 
behavior  bounds  that  of  the  original.  By  using  algorithms  developed  to  analyze  these  simple 
circuit  models,  rigorous  upper  and  lower  bounds  on  signal  waveforms  can  be  generated, 
producing  approximations  with  bounded  uncertainty.  The  third  section  discusses  the  use  of 


simplified  models  to  generate  bounds. 


When  considering  a  digital  MOS  circuit  model  with  feedback,  including  local  feedback 
produced  by  Miller  capacitance,  even  simplified  bounding  models  become  quite  complex.  To 
generate  tight  bounds,  simplified  models  must  also  contain  the  feedback  paths  present  in  the 
original  circuit.  The  techniques  developed  for  efficient  exact  analysis  of  such  circuits  can  also  be 
extended  to  include  bounds.  More  specifically,  the  Waveform  Relaxation  algorithm  [3]  can  be 
extended  to  include  waveform  intervals,  and  the  behavior  of  small  tightly  coupled  subcircuits  can 
be  bounded  separately.  The  partitioning  essential  for  efficient  VLSI  simulation  can  still  be 
achieved.  The  performance  of  the  algorithm  can  be  roughly  maintained  in  the  bounding  context, 
and  bounds  can  be  generated  for  even  sophisticated  circuit  models  with  a  computation  time  that 
scales  roughly  linearly  with  circuit  size.  The  extension  of  Waveform  Relaxation  to  include  bounds 
is  discussed  in  the  fourth  section. 

Experiments  have  shown  that  useful  bounds  can  be  constructed  for  many  of  the  specia1 
subcircuits  found  in  digital  MOS  logic.  These  results  are  discussed  briefly  in  the  fifth  section.  As 
more  sophisticated  bounding  algorithms  are  developed,  they  should  provide  a  useful 
enhancement  to  VLSI  simulation  programs.  Bounding  algorithms  have  the  nice  property  that  they 
never  give  an  incorrect  answer.  In  the  worst  case,  a  bounding  algorithm  might  not  provide 
sufficient  accuracy  in  a  given  subcircuit  at  an  acceptable  cost,  forcing  the  use  of  conventional 
exact  algorithms.  As  more  efficient  algorithms  are  developed  for  a  wider  range  of  subcircuits, 
they  can  be  incorporated  incrementally  to  improve  average  performance. 

II.  Digital  MOS  Circuits 

The  behavior  of  digital  MOS  circuits  can  be  successfully  bounded  at  the  level  of  logic  signal 
waveforms,  but  in  other  problems  bounding  has  proven  more  difficult.  A  simple  example  can 
illustrate  why  some  calculations  are  difficult  to  bound  efficiently.  Consider  first  the  calculation 
Y*  X-X.  If  the  variable  X  is  known  to  lie  in  the  interval  [0,1],  a  straightforward  bound  on  the 
subtraction  operator  will  conclude  that  Y  must  lie  in  the  interval  [-1,1].  By  ignoring  any 
"correlations"  between  the  two  operands  of  the  subtraction,  calculations  are  made  feasible  but 
information  is  lost.  Y  must  be  zero  even  if  X  is  completely  unknown.  In  this  case  ignored 
correlations  amplify  uncertainty. 

A  calculation  that  does  not  exhibit  a  correlation  problem  is  Y  *  X  +  X.  Knowledge  that  the  two 
operands  must  be  identical  does  not  change  the  conclusion  that  Y  must  lie  in  the  interval  [0,2]  if  X 
lies  in  [0,1].  Uncertainty  in  the  delay  of  restored  signal  waveforms  is  more  nearly  analogous  to  the 


adder  example.  Rough  bounds  on  delays  have  been  used  successfully  by  TTL  circuit  designers 
for  years.  The  main  correlation  that  is  ignored  in  digital  logic  circuits  is  that  among  logic  signals. 
Roughly  speaking,  extreme  circuit  behaviors  arise  when  signals  and  devices  are  either  all  slow  or 
all  fast.  Unrealistic  situations  in  which  some  are  slow  and  others  fast  produce  behaviors  that  fall 
somewhere  in  between.  As  a  result,  ignoring  this  correlation  does  not  generally  amplify 
uncertainty. 

Basic  uncertainty  in  a  simulation  arises  from  the  use  of  incompletely  specified  models,  the  use 
of  simplification  to  speed  computation,  and  the  use  of  bound  relaxation  sequences  not  taken 
completely  to  convergence.  As  long  as  ignored  correlations  do  not  amplify  this  basic  uncertainty 
to  a  large  extent,  it  can  be  kept  quite  small  even  when  substantial  simplifications  are  made. 

We  have  found  that  there  are  additional  correlations  that  must  be  ignored  to  produce  simple 
and  rigorous  bounding  algorithms  for  complex  digital  MOS  circuit  models,  but  these  involve 
second  order  effects  for  restoring  logic  circuits.  Examples  are  correlations  between  signals  and 
their  derivatives,  correlations  between  values  of  derivatives  over  time,  and  correlations  between 
the  effects  of  coupling  elements  such  as  Miller  capacitors  on  the  two  subcircuits  they  connect. 
The  ignored  correlations  only  have  a  large  effect  for  certain  types  of  MOS  subcircuits.  For 
example,  bootstrap  drivers  depend  on  strong  correlations  among  variables  to  operate  correctly. 
Also,  circuits  such  as  ring  oscillators  with  strong  negative  feedback  are  sensitive  to  correlations 
among  signals,  even  when  considered  only  for  a  small  number  of  cycles.  Dynamic  nodes  with 
capacitive  coupling  need  bounds  on  total  charge  sharing  to  maintain  voltages  in  the  face  of 
correlations  over  time.  For  standard  restoring  logic  circuits,  though,  immunity  to  uncertainty 
amplification  makes  bounding  techniques  very  attractive.  Only  for  the  small  portion  of  a  circuit 
containing  difficult  circuits  is  virtually  exact  analysis  required,  and  these  can  potentially  be  found 
with  a  bounding  algorithm  by  observing  where  uncertainty  is  amplified. 

III.  Simplified  Bounding  Models 

The  monotonic  properties  of  the  MOS  circuit  model  can  be  used  to  bound  its  response  through 
the  simulation  of  a  simplified  model.  We  consider  monotonicity  of  circuit  behavior  with  respect  to 
both  input  waveforms  and  element  constitutive  relations.  A  monotonic  function  maps  bounds  on 
its  operands  into  rigorous  bounds  on  its  output,  and  nonlinear  input  waveforms  and  element 
constitutive  relations  can  be  tightly  bounded  by  piecewise  linear  ones.  As  a  result,  the  behavior  of 
any  circuit  with  the  desired  monotonicity  can  be  bounded  using  piecewise  linear  analysis. 
Computation  costs  are  similar  when  using  approximate  piecewise  linear  analysis,  but  uncertainty 
can  be  measured  when  bounds  are  used. 


It  is  assumed  here  that  a  VLSI  circuit  model  consists  of  discrete  resistors,  capacitors,  n-channel 
transistors,  and  p-channei  transistors  with  continuous,  monotonic  constitutive  relations.  The 
circuit  is  excited  with  external,  grounded  voltage  sources.  The  transistor  models  considered  are 
purely  resistive,  so  all  internal  transistor  capacitance  is  modeled  with  capacitors  at  the  transistor 
terminals.  Most  rough  or  sophisticated  MOS  circuit  models  currently  used  fall  into  this  class  of 
networks. 

To  use  waveform  relaxation  algorithms  along  with  nodal  analysis,  it  is  sufficient  to  assume  that 
all  nodes  with  capacitance  are  connected  to  each  other  through  some  path  of  finite,  positive 
incremental  capacitance.  In  an  integrated  circuit  model  where  every  node  is  connected  to  a 
grounded  capacitor,  this  condition  is  satisfied.  For  networks  that  do  not  satisfy  this  constraint, 
arbitrarily  small  capacitors  must  be  added.  The  nodes  in  the  capacitor  network  that  are  also 
connected  to  non -capacitive  elements  are  called  “independent"  nodes,  and  their  voltages  are 
used  to  represent  circuit  behavior. 

In  this  section,  only  the  behavior  of  small  subcircuits  is  considered,  as  relaxation  can  be  used  to 
combine  these  to  analyze  the  behavior  of  the  entire  network.  Consider  first  a  Waveform 
Relaxation  partitioning  [3]  that  corresponds  to  isolating  each  of  the  independent  nodes,  and 
using  the  behaviors  of  the  other  nodes  as  inputs.  In  this  case,  we  have  been  able  to  prove  [2]  that 
the  behavior  of  each  subcircuit  is  monotonic  in  its  input  waveforms  and  in  the  constitutive 
relations  of  its  elements  if  the  following  conditions  hold: 

•  Only  independent  nodes  are  connected  to  two  or  more  elements  of  different  type, 
e.g.,  resistors  and  n-channel  transistors. 

•  All  elements  form  series-parallel  "one-port"  groups  that  directly  connect  two 
independent  nodes.  ( One-ports  are  defined  for  transistors  as  if  they  were  resistors 
from  drain  to  source. ) 

•  All  transistor  gate  and  substrate  terminals  must  be  connected  to  independent  nodes, 
except  for  the  special  case  of  depletion  load  pullups. 

The  most  sophisticated  MOS  circuit  models,  containing  some  finite  grounded  capacitance  at 
every  node,  automatically  satisfy  all  of  these  constraints.  The  constraints  are  unrestrictive 
enough  that  most  intermediate  complexity  models  that  ignore  or  group  various  capacitances  are 
also  included.  Any  circuit  that  violates  one  of  the  constraints  can  be  transformed  into  a  slightly 
more  complex  legal  one  with  the  addition  of  arbitrarily  small  grounded  capacitors  and  arbitrarily 
large  grounded  resistors.  As  a  result,  the  solution  for  each  independent  node  behavior  in  a  MOS 
integrated  circuit  can  be  bounded  by  the  solution  of  a  much  simpler  circuit,  e  g.,  one  containing 
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piecewise  linear  input  waveforms  and  devices. 

Partitioning  of  MOS  circuits  usually  is  more  efficient  when  taken  only  to  the  level  of  strongly 
connected  clusters.  A  cluster  is  defined  roughly  as  a  group  of  nodes  connected  by  a  d.c.  current 
path  that  does  not  pass  through  the  power  supply.  To  extend  the  mcnotonic  relationships  to 
larger  partitions,  a  circuit  transformation  must  be  used.  Any  element  that  connects  two  nodes 
within  the  partition  must  generally  be  duplicated  to  separate  its  effect  on  each  node.  A  grounded 
dependent  source  is  connected  to  each  instance  of  the  element  to  represent  the  effect  of  the 
other  node.  After  such  a  transformation,  the  cluster  solution  becomes  a  monotonic  function  of 
both  its  input  waveforms  and  element  constitutive  relations,  if  duplicated  elements  are  treated  as 
distinct.  For  many  cluster  circuits  that  contain  transistors,  the  correlations  that  are  ignored  as  a 
result  of  this  transformation  can  represent  a  significant  source  of  information  loss. 

Many  special  cluster  circuits  that  appear  often  in  simplified  circuit  models  have  more  powerful 
monotonic  properties  that  have  been  investigated.  Many  elements  do  not  need  to  be  duplicated 
to  guarantee  monotonicity,  removing  an  ignored  correlation  and  further  simplifying  the  bounding 
circuit  at  the  same  time.  One  important  case  that  we  have  considered  is  an  RC  tree  driven  by  a 
restoring  logic  gate,  where  the  resistors  are  fixed,  there  is  no  internodal  coupling  capacitance, 
the  pullup  and  pulldown  networks  have  no  internal  capacitance,  and  the  output  is  monotonic  in 
time  (4].  The  special  properties  of  nonlinear  RC  lines,  meshes,  and  trees  with  various  restrictions 
have  also  been  considered  in  detail  [5],  [6],  [7].  If  the  response  of  a  cluster  can  be  bounded  by 
that  of  a  linear  RC  circuit,  results  exist  that  allow  the  response  of  the  linear  circuit  to  be  bounded 
with  simple  closed-form  expressions  [8],  [9]. 

IV.  Bound  Relaxation 

The  key  to  efficient  analysis  of  digital  MOS  circuits  is  partitioned  analysis.  For  intermediate 
complexity  circuit  models  with  no  feedback  through  MOS  transistors,  event  driven  simulation  has 
been  used  along  with  a  straightforward  partitioning  into  clusters.  Bounding  the  response  of  such 
circuit  models  can  take  advantage  of  the  same  techniques  in  a  straightforward  manner, 
propagating  bounds  from  inputs  of  clusters  to  outputs. 

The  Waveform  Relaxation  algorithm  is  an  example  of  a  partitioning  method  for  general  circuit 
models  that  performs  well  in  exact  simulation.  Each  tightly  coupled  subcircuit,  corresponding 
roughly  to  a  logic  block,  is  analyzed  separately  over  large  time  intervals  to  iteratively  improve 
estimates  for  the  solution.  Latency  is  exploited  by  using  variable  time  steps  across  subcircuits, 
convergence  is  fast  for  most  digital  MOS  circuits,  and  computation  cost  scales  roughly  linearly 


with  circuit  size. 
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Waveform  Relaxation  can  be  extended  to  handle  intervals  of  waveforms  with  performance 
similar  to  that  of  the  exact  algorithm  [10].  As  a  result,  the  efficiency  and  linear  scaling  arising 
from  partitioning  can  be  used  when  bounding  complex  circuit  models.  For  bounding 
applications,  the  advantage  of  restricting  analysis  to  small  subcircuits  appears  to  be  large,  as  the 
complexity  of  generating  direct  bounds  appears  to  grow  very  quickly  with  circuit  size. 

The  exact  Waveform  Relaxation  algorithm  is  based  on  a  "relaxation  function"  that  maps  one 
estimate  of  circuit  behavior  to  a  better  one  based  on  the  exact  response  of  each  subcircuit.  We 
consider  a  "bounding  function,"  a  map  from  intervals  of  circuit  behavior  to  new  intervals  based 
on  bounds  on  the  responses  of  each  subcircuit.  The  results  in  the  previous  section  show  how 
bounding  functions  can  be  calculated  efficiently.  The  bounding  function  is  defined  so  that  its 
output  interval  contains  the  image  of  its  input  interval  under  a  relaxation  function.  We  have  been 
able  to  show  that  such  a  bounding  function  has  two  important  properties: 

•  A  bounding  function  maps  a  rigorous  bound  on  the  solution  into  another  rigorous 
bound. 

•  A  tentative  bound  on  the  solution  which  is  mapped  to  a  tighter  one  by  a  bounding 
function  is  guaranteed  to  be  a  rigorous  bound. 

A  tight  bounding  function  roughly  maintains  the  contraction  property  of  the  exact  algorithm 
upon  which  it  is  based.  If  a  conservative  guess  for  the  solution  can  be  generated,  a  bounding 
function  can  be  used  both  to  check  if  it  is  a  valid  bound  and  to  tighten  it  through  repeated 
relaxation.  Relaxation  using  a  bounding  function  will  tend  to  converge  quickly  to  an  interval  that 
is  self  consistent,  i.e.,  the  bounding  function  maps  the  interval  to  itself.  If  the  sequence  of 
intervals  is  telescoping  from  a  very  conservative  initial  guess,  any  element  in  the  sequence  is  a 
valid  bound.  If  the  sequence  is  taken  to  convergence,  the  solution  corresponds  to  the  direct 
solution  obtained  by  ignoring  correlations  between  subcircuits. 

V.  Experimental  Results 

Some  experimental  programs  have  been  written  to  assess  the  feasibility  of  a  bounding 
approach.  Algorithms  appropriate  for  intermediate  complexity  models  of  gate  arrays  have 
measured  delay  uncertainty  in  the  range  of  ±30%  with  simplifications  corresponding  to  using  RC 
bounding  circuits  for  each  gate  [4],  Accuracy  can  be  increased  by  using  piecewise  constant 
transistor  bounds  with  any  number  of  segments.  The  cost  of  measuring  the  uncertainty  in 
approximating  gate  array  simulation  algorithms  is  not  large,  and  a  bounding  algorithm  can  find 
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critical  paths  and  produce  tight  bounds  on  the  outputs  without  wasting  computation  where  it  is 
not  necessary. 

A  program  to  relax  bounds  on  general  linear  RC  mesh  circuits  has  also  been  developed  [11]. 
Since  the  relaxation  can  be  calculated  exactly  in  the  linear  case  and  feedback  is  positive  between 
subcircuits,  precise  knowledge  of  the  element  values  and  initial  state  leads  to  a  sequence  of 
closed-form  bounds  that  becomes  arbitrarily  tight.  The  algorithm  can  be  used  to  tighten  direct 
closed-form  bounds  on  RC  trees  for  circuits  in  which  they  are  loose,  or  handle  "leaky"  RC 
meshes  with  arbitrary  initial  state  not  treated  by  direct  bounding  algorithms.  Experiments  using 
the  program  have  indicated  that  even  within  small  clusters,  relaxation  techniques  can  have 
acceptable  performance,  producing  delay  bounds  that  are  tighter  than  conventional  ones  [8]  after 
only  a  few  iterations. 

Lastly,  simple  bounding  algorithms  for  general  MOS  circuit  models  have  been  tested  on 
combinational  logic  circuits  containing  Miller  capacitors.  The  bounding  algorithm  for  subcircuit 
responses  uses  piecewise  constant  transistor  models  and  produces  piecewise  linear  voltage 
waveforms.  Using  simplified  models  that  generate  delay  uncertainty  on  the  order  of  ±5%  for 
circuits  without  Miller  capacitance,  the  addition  of  Miller  capacitance  only  increases  delay 
uncertainty  by  roughly  a  factor  of  two  or  three.  In  addition,  at  the  t5%  level  of  accuracy,  bound 
relaxation  exhibits  convergence  behavior  similar  to  that  of  exact  relaxation. 

VI.  Conclusion 

A  theoretical  framework  has  been  established  that  should  lead  to  efficient  bounds  on  the 
behavior  of  a  large  portion  of  typical  digital  MOS  integrated  circuits.  A  bounding  approach  to 
simulation  has  many  advantages,  and  initial  experiments  regarding  feasibility  have  been  very 
encouraging.  More  experience  with  a  simulation  program  based  on  bounding  techniques  is 
needed  to  guide  further  research  in  the  field.  Towards  that  end,  work  is  required  on  chosing 
detailed  strategies  for  bounding  simulators  and  high  level  algorithms  for  managing  uncertainty 
once  it  can  be  measured.  More  powerful  results  are  also  desirable  for  general  cluster  circuits, 
important  special  subcircuits,  and  simplified  models. 
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Computationally  simple  Pounds  for  siqnal  prc- 
paaation  delay  in  linear  RC  tree  models  for  MCS 
interconnect  were  derived  in  [1]  and  have  proved 
useful  in  tinino  analvsis  of  dicitai  MOS  IC‘s  { 2—4 ! . 
We  snow  that  these  oounds  can  se  derived  quite 
simply  as  the  payoff  functions  for  a  certain  linear 
optimal  control  proclem  and  t.nat  t.ney  apply  not  only 
to  RC  trees  Put  to  more  qeneral  RC  mesnes  as  well. 
Finally,  two  methods  are  qiven  for  tiqntenmq  the 
oriamal  Pounds  qiven  in  [11. 
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Consider  the  step  response.  Substituting  e«l 

and  i  »  -C,  v.  into  (1)  ,  yields  the  network  differen 
k  k  x 

tial  equations 


1  -  vi(t) 


r  C . v  it),  i«l, 
i  '  -  : 


,N,  t  >  0. 


Ir.  dicitai  integrated  circuits,  sicnal  crc- 
paoatior.  deiav  tr.rouqr.  conducting  ratns  crews  in 
relative  importance  as  feature  sizes  snnnx.  Bounds 
or.  the  delay,  applicable  to  tr.ose  paths  that  can  oe 
mooelleo  as  "RC  trees,"  were  denvec  m  [11.  But, 
as  discussed  in  [5-71,  certain  circuits  used  m  mos 
loqic  cannct  be  modelled  as  RC  trees  cecause  they 
contain  one  or  more  loops  of  resistors,  as  snewn  in 
Fiq.  1.  Several  examples  of  such  circuits,  called 
"RC  mesnes,"  arisinq  in  MCS  loqic  networks  are  given 
in  [5,61. 

This  paper  is  concerned  with  bounds  on  the  zero- 
state  step  response  and  hence  on  sionai  prcpaoatior. 
delay  ir.  linear,  lumped  RC  tree  and  nesn  networks 
driven  cv  an  ideal  vcltace  scurre. 


work  Cif fs 


© 


RC  Mesnes 


1  i  i  < 

Fig.  2:  The  resistor  subnetwork  R. 

which  are  identical  m  form  to  eq.  (9)  of  [ 1 i 

only  difference  is  that  ir.  [1!  certain  resist 

R  ,  defined  stecif icailv  ir.  terms  of  the  tc 
i' 


of  a  tree 
“II. 


ppear  m  place  of  the  r^’s  above 
Cor.troi  ”etho 
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The  derivation  outlined  below  is  an  alternative 
to  that  in  !1)  and  yields  essentially  the  same 
results,  but  it  applies  to  meshes  as  well  and  also 
affords  a  natural  way  to  incorporate  additional 
information  and  thereby  obtain  tighter  bounds,  as 
shown  in  Section  IV. 


Fiq.  1:  Linear  RC  mesh. 


Isolate  the  resistor  subnetwork  R  containing 
all  the  resistors  and  assign  reference  directions 
to  the  capacitor  currents  as  shown  in  Fig.  2.  The 
node  voltages  with  respect  to  datum  are  given  in 
terms  of  the  positive-definite,  symmetric  matrix  R 
ea  shown  below. 


For  any  three  nodes,  i,  j,  k  of  an  RC  mesh, 

rii  rkj  2-  rki  rij  ’  ,3> 

The  proof  of  (3),  given  in  [61,  generalizes  the 
argument  for  the  special  case  of  a  tree  in  (11. 
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The  zero-state  step  response  of  an  RC  mesh  is 
completely  monotone,  i.e., 

v.lti  >  0,  jwl,...,K,  Vt  >  0  .  (4) 

The  proof  is  in  [ a  I . 


For  any  two  nodes  i  and  k  of  an  RC  mesh  and  any 
instant  t  during  the  step  response. 


fi  l0)  *  Td  '  il'v1(0)  *  1  '  (12) 

state  constraints 

T  (1-v  Itll  *•  f  (t)  '  T  (1-v  (t)),  Vt  >  0  , 

*i  1  -  X  -  P  1  -  (U) 

input  constraint:  u(t)  ^  0,  vt  ^  0  ,  (14) 

and  terminal  condition 

(1-v  (T))  «  (l-v  *),  0  v  *  <  1  .  (15) 
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r  .  ( 1 —v  (t)  )  >  r,  (1-v  (t)  ) 
11  k  -  ki  v 


r,  (1-v  (t)  ) 
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Civen  Facts  1  and  2,  the  derivation  of  (S)  and 
16)  is  identical  to  that  m  Appendix  0  of  )li. 

At  this  point  the  strategy  becomes  one  of 
reduced  order  modelling  with  time-domain  error 
bounds.  Choosim  a  dlst lnouisnec  node  i  as  t.ne  cut- 
pit  r.b-iv  of  interest,  we  describe  tn=  system  in  terns 
of  only  two  state  vanaoles,  the  distance  to  equi¬ 
librium  (1-V.  tt>\  and  its  ir.teorsl 


Vc)  s  ,  a-v^tvidf  =  .  rikrk:i-vk-t>]  , 


Fig.  3:  Fastest  and  slowest  trace .tor les . 

The  optimal  trajectories  can  be  determined  by 
inspection,  since  tr.e  time  duration  of  any  path  in 
the  (1-v.  )  -  f^  plane  can  De  found  by  rearranomo 

and  mteoratir.c  !13)  to  yield 


where  tr.e  last  equality  follows  icon  substituting 
(2)  into  the  integral  and  evaluating.  Using  (c!  and 
(6i  in  (?)  vie  Ids  the  following  ir.equalitv  between 
se  two  -stats  vinabies: 


-  '  : s  k  ‘ii 


l-v  't: 


1  ’  r,  1  (1-v 
-  xk  * 
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-  f.  ,  1-v  i 

•  final  l 

Thus  the  fastest  trajectory  from  the  initial  point 
to  tr.e  tarcet  interval  vs  the  one  for  wmcr.  noth  the 

recicr,  of  intecraticr.  if.  f  ar.d  the  inte- 

rm.a.  mit 

crand  (1-v.)  L  are  minimized,  and  the  slowest  tra- 
l 

rectory  is  found  similarly.  See  Fit.  3.  The 
minimum,  and  maximum  times  depend  or.  the  "tercet" 


voltage  v  ar.d  are  denoted 


T  (v  i  and  T  (v  i. 
min  i  max  i 


Tr.e  inverse  functions,  denoted  restectivelv  v.(t) 

i 

and  v. !t; ,  are  the  upper  and  lower  bounds,  respec- 


Fron  (7)  one  initial  condition  is 


f  (0)  »  /  r . .  C. 
i  j  ik  k 


It  was  snown  in  ill  that  step  response  bounds 
can  be  obtained  by  appropriate  manipulations  of  (4, 
5)  and  (7-9)  aoove .  but  the  netr.ocology  is  somewhat 
oDscure.  A  clearer  view  emerges  from  recasting  the 
calculations  into  the  form  of  a  linear  minimum- 
land  maximum-)  time  optimal  control  problem  with 
state  constraints,  in  which  an  input  u(t)  is  intro¬ 
duced  tn  represent  the  unknown  waveform  vMt): 

Himnize  (or  maximize'  T 


tiveiv,  for  tne  step  response  of  tr.e  mesn.  »ne 

aiueoraic  form  of  v  (ti  ar.d  v.(t!  obtained  in  t.nis 
-i  l 

way  can  oe  easily  read  off  from  Fig.  3  and  aorees 
with  the  results  in  ill:  the  exact  expressions  are 
omitted  for  the  sake  of  brevity.  They  approach  a 
well-defined  limit  in  the  case  of  a  distributed  net¬ 
work,  e.g.,  t.ne  simple  example  in  Fig.  4,  fer  which 
they  are  plotted  in  Fig.  5. 


for  the  dynamical  system 


f  (t)  -  -<l-v  (t) ) 


—  (1-v  (t)  )  »  Il(t!  , 
dt  i 


with  initial  conditions 


Fia.  4:  The  bounds  approach  a  well-defined  limit 

for  a  distributed  network  such  as  this  one,  for  which 

T_  »  1.33  ns.,  T_  *  1.6  ns.,  T„  »  2.0  ns,  and  T.  » 

P  D  P  R . 

ii  i 

0.33  ns . 


904- 


•  « 

Fig.  1:  ~tep  response  counds  for  the  network  in  Fig. 
4,  wit r.  output  taken  at  r.oae  i. 

The  counds  represent  an  effort  to  approximate 
the  dynamics  of  a  r.igr.er  order  network  cy  one  with 
a  sinqie  time  constant  T^  :  they  are  exact  only  in 

that  case.  Whenever  (T  -  T  >  <<  T  ,  the  wedge- 
P  ai 

shaped  region  m  Fig.  3  is  quite  narrow  and  the 
bounds  will  oe  quite  tignt.  Chapter  3  of  [7]  gives 
examples  of  networxs  for  whicn  the  counds  are  good 
and  ot.ners  where  they  are  poor. 


IV.  Method  "A"  for  9ounas  Improvement :  -.imits  on 
tr.e  v.aximum  ivw  Pate  of  :.oae  ’.'oitaaes 

The  optimal  trajectories  shown  in  fn.  3  include 
horizontal  secmer.ts  along  wnicr.  Changes  while  f ^ 

remains  constant,  corresponding  to  instantaneous 
tumps  ir.  v  t.-.at  cannot  occur  in  practice.  We  can 


tionten  the  pounds  py  adding  constraints  eliminating 
sucn  tra'ectories .  The  simplest  form  for  suer,  a 
constraint  is  a  "minimum  slope  pound"  in  t.-.e 
(l-v^)  -  f^  plane  of  tne  form 


dll-vJ  - 


i  >  0 


(17) 


This  rules  out  both  trajectories  in  Fig.  3  as 
feasible  solutions.  The  new  optimal  trajectories 
are  as  s.-.ewr.  in  Fig.  6,  and  the  corresponding 
al-recraic  form  for  v  't!  ar.d  v.  (t;  is  given  in  [9J  . 

-i  i 


The  inequality  (171  corresponds  to  a  "slew-rate 
bound,"  i.e.,  a  round  on  the  derivative,  for  v  , 
since 


(l-vi) 


•  •  ,  d(l-v  > 

-  (1- v  )/(l-v.)  »  (1-v .)/{.  «  -T— - -  <  — . 

i  i  i  x  d  f4  —  T 

(18) 


Fig.  6:  Altered  optimal  trajectories. 


For  any  mesh  we  know  that  ».  riici>  since 


1-v  ■*  r  -v  r  .  v  ,  from  i2)  and  (4>  . 

i  i)  J  j  -  nil 

3*1 

Using  t  »  r.  C  can  significantly  tighten  the 
l  n  i 

bounds  wnenever  the  mesn  contains  only  a  small  nunoer 
of  lumped  capacitors,  as  is  commonly  the  case  in 
reasonably  accurate  circuit  models  for  distributed 
interconnect  [101 .  But  as  progressively  more  R‘s 
and  - ' s  are  used  to  model  a  giv»n  section  of  inter¬ 
connect.  C.  -  9  and  (17)  becomes  useless  with 

i 

t  *  r  .  C. 
i  hi 

Fortunately,  values  of  t  greater  than  r  C 

i  hi 

can  re  found  for  many  PC  trees.  Space  constraints 
limit  us  to  mentioning  only  one  of  tne  results  in 
this  direction  obtained  in  fill.  Consider  an  PC 
line  with  the  nodes  numbered  in  increasing  order  as 
one  moves  away  from  the  source.  It  was  first  noted 
in  [ 7  J  that  for  such  a  network 


$  (t>  v .  ( t ) 

i  l 

1-v. (t)  —  1-v  ( t / 
3  i 


Vj  *  i,  vt  >  0  , 


(19) 


a  rigorous  proof  was  given  and  the  result 
further  extended  in  ill).  Using  (1?)  and  d)  in  (2), 
one  can  show  that  if  i  is  any  node  of  an  RC  line,  or 
any  node  of  an  P.C  tree  between  the  source  and  the 
first  branch  point,  then 


Ti 


(23) 


This  improvement  is  illustrated  ir.  Fiu 


V.  Metr.od  "3"  for  Sounds  Improvement:  jn.atial 
Convexity  of  Nose  voltages 

At  any  instant  during  the  step  response  of  an 
RC  line  or  tree,  the  node  voltages  are  a  convex 
function  of  distance  from  the  source.  For  the  net¬ 
work  m  Fig.  4,  a  characteristic  voltaae  profile  is 
plotted  in  Fig.  7,  along  with  the  bounds  (5)  and  (6! 


Fig.  7:  Voltage  profile  for  Fig.  4. 

Considerable  improvement  over  (6)  is  possible  since 
•  convex  curve  is  bounded  below  by  any  tangent  line, 
i.e.,  * 

l-vfc  i  d-Vj) 


1  ♦  X 


kk 

rii 


-  1 


(21) 


for  some  X  c[0,l|.  Substituting  (21)  into  the  right 
hand  side  of  (7)  and  taking  the  maximum  over  X 
yields 
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thus  reducing  tne  effective  value  of  T  (from  2.00 

P 

ns.  to  1.03  ns.  for  the  network  in  Fig.  4)  and 
further  improving  the  voltage  bounds  as  snown  in 
Fig.  5.  Current  research  includes  extending  this 
cecnnique  to  trees. 
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Abstract 

Schema  provides  an  integrated  environment  for  all  aspects 
of  the  synthesis  and  analysis  of  electronic  designs  from  PC 
boards  through  circuit  and  mask  design  of  VLSI  devices.  It 
simplifies  the  development  of  synthesis  and  analysis  tools  by  us¬ 
ing  uniform  data  structures  and  by  making  available  libraries  of 
standard  routines  and  advanced  control  structures  appropriate 
for  CAD  tool  development.  Because  all  tools  in  the  Schema  en¬ 
vironment  utilize  the  same  abstract  data  structures  it  is  easy  for 
tools  to  interchange  data  about  a  design  or  even  peices  of  the 
design  itself.  Schema  also  permits  much  of  the  design  to  be  done 
in  a  technology  independent  fashion  by  allowing  the  designer  to 
delay  implementation  decisions  until  the  last  possible  moment. 
The  information  associated  with  a  particular  component  of  a 
design  is  organized  as  a  module.  Modules  contain  schematics, 
icons,  topologies,  layouts,  simulation  results,  and  other  descrip¬ 
tive  information  for  this  component.  The  descriptions  contained 
in  modules  are  implemented  as  procedures  which  utilize  other 
modules  in  a  hierarchical  fashion.  Schema  is  under  joint  devel¬ 
opment  by  MIT  and  Harris  Corporation. 


SCHEMA  is  an  environment  for  developing  knowledge 
based,  computer  aided  design  tools  for  electronic  systems. 
The  three  major  goals  of  its  design  are: 

e  Provide  an  integrated  environment  for  all  aspects  of 
the  synthesis  and  analysis  of  electronic  designs  from 
PC  boards  through  circuit  and  mask  design  of  VLSI 
devices. 

•  Simplify  the  creation  of  computer  aided  design  tools 
by  encouraging  and  supporting  their  construction  from 
libraries  of  standard  routines,  by  using  uniform  data 
structures  and  by  providing  libraries  of  advanced  con¬ 
trol  structures  appropriate  for  CAD  development. 

•  Allow  the  designer  to  delay  making  decisions  until  nec¬ 
essary;  for  example,  the  technology" (TTL,  ECL,  gate 
array  or  custom  MOS)  used  in  a  logic  design  need  not 
be  specified  until  timing  simulations  or  physical  design 
is  begun. 

The  key  to  achieving  these  goals  is  the  development  of  a 
totally  integrated  design  environment  where  design  tools 
easily  communicate  and  cooperate.  This  has  been  achieved 
by  the  innovative  software  architecture  used  in  the  devel¬ 
opment  of  Schema. 

SCHEMA  achieves  coherence  not  by  specifying  the  in¬ 
terchange  formats  to  be  used  between  different  CAD  pro¬ 
grams,  but  rather  by  specifying  the  data  structures  the  pro¬ 
grams  should  use.  SCHEMA  specifies  a  set  of  abstract  data 
types  for  dealing  with  electronic  designs,  and  a  set  policies 
to  be  used  when  dealing  with  the  new  data  types_  This. 


approach  provides  a  common  layer  on  which  different  CAD 
tools  may  built,  and  it  allows  the  CAD  tools  to  invoke  each 
other  and  easily  cooperate  by  interchanging  pieces  of  elec¬ 
tronic  designs. 

These  data  types  are  implemented  using  an  object  ori¬ 
ented  programming  system  called  Flavors4.  These  struc¬ 
tures  represent  circuit  topologies  and  schematics,  mask  art¬ 
work,  floorplans  and  simulation  waveforms  (both  digital 
and  analog).  Circuit  topologies  represent  the  connectivity 
of  a  circuit;  schematics  are  represenations  of  the  graphic 
images  of  a  circuit  that  are  drawn  on  paper.  Since  these 
structures  are  instances  of  flavors,  they  also  incorporate 
pieces  of  code  that  allow  them  to  directly  provide  procedu¬ 
ral  functionality.  That  is.  a  transistor  contains  the  informa¬ 
tion  and  code  required  to  display  itself  on  the  screen,  write 
itself  out  to  a  file  or  participate  in  a  simulation.  This  raises 
the  semantic  level  at  which  the  CAD  tools  deal  with  objects. 

simplifying  their  development.  It  also  allows  implementa¬ 
tion  and  operation  decisions  to  be  delayed  and  even  changed 
without  modifying  the  code  that  makes  use  of  them. 

Modules 

The  basic  component  of  a  design  in  SCHEMA  is  a  mod¬ 
ule.  Each  module  consists  of  a  topology  and  several  de¬ 
scriptions,  e  g.  schematics,  icons,  layouts  and  simulation 
results.  Examples  of  modules  in  a  design  include:  an  in¬ 
verter,  a  half  adder,  an  arithmetic  logic  unit,  a  data  path,  a 
cache,  instruction  fetch  unit  and  a  memory  system.  Each  of 
these  module  includes  not  only  the  schematic  (and  its  cor¬ 
responding  topology),  but  also  the  results  of  various  tests 
that  have  been  performed  on  the  circuit  (simulation  re¬ 
sults),  documentation  and  design  notes  and  physical  speci¬ 
fications  (VLSI  layouts  or  PC  board  designs).  The  modules 
represent  a  complete  view  of  a  design  component. 

The  designer  rarely  interacts  directly  with  the  topol¬ 
ogy  of  a  module,  but  instead  deals  with  the  descriptions 
(schematics).  The  analysis  tools  (simulators,  timing  veri¬ 
fiers  and  other  consistency  checkers)  work  with  the  topol¬ 
ogy,  and  usually  use  the  descriptions  only  for  communicat¬ 
ing  with  the  designer.  The  only  major  exceptions  are  the 
physical  design  tools,  VLSI  layout  system  and  wire  wrap 
and  PC  board  systems  that,  by  necessity,  must  work  with 
the  physical  descriptions. 

The  system  ensures  that  the  topology  remains  consis¬ 
tent  with  the  descriptions  provided  by  the  designer  through 
the  use  of  timestamps  and  limited  edit  trails.  It  also  warns 
the  designer  when  two  descriptions  of  a  design  become  in- 
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consistent.  This  division  allows  the  electronic  designer  to 
use  the  moat  appropriate  mechanism  for  describing  the  de¬ 
sign  without  worrying  about  getting  formats  correct  for  the 
CAD  tools,  and  the  CAD  tool  designer  deals  only  with 
design  descriptions  that  are  both  appropriate  and  “pre¬ 
parsed." 

The  topology  and  its  descriptions  are  implemented  as 
procedures,  though  they  are  usually  edited  via  one  of  the 
description  editors',  schematic,  layout  or  waveform.  This 
procedural  structure,  similar  to  the  approach  used  in  DPL1, 
allows  a  great  deal  of  flexibility  parameterizing  the  different 
components  and  provides  an  excellent  point  at  which  to 
install  intelligent  synthesis  modules.  For  instance,  in  an 
earlier  version  of  SCHEMA  this  was  used  to  implement  an 
ALU  module  that  chose  different  carry  look-ahead  schemes 
depending  on  the  width  of  the  data  word2. 

These  hierarchical  descriptions  also  incorporate  a  mul¬ 
tiple  yiewpoint  or  Slices3  mechanism  to  allow  simulation 
and  analysis  modules  to  annotate  the  topologies.  The  mul¬ 
tiple  viewpoints  are  used  to  control  the  visibility  of  cer¬ 
tain  information  to  the  CAD  tools.  For  instance,  transient 
analysis  programs  like  SPICE  want  to  be  aware  of  parasitic 
capacitances  and  resistances  while  a  simple  logic  simula¬ 
tor  might  not.  Rather  than  generating  the  two  different 
topologies  for  the  different  simulators,  the  same  topology  is 
used  for  both  but  the  parasitica  are  only  visible  when  the 
transient  viewpoint  is  made  visible  by  SPICE.  This  way  an¬ 
notations  to  the  topology  made  by  the  two  simulators  can 
be  examined  by  their  conterparts  easily. 

Project/Module  Hierarchy 

The  modules  and  all  other  information  relating  to  a 
design  are  collected  into  a  project,  which  in  turn  can  be  a 
component  of  a  larger  project.  For  instance,  there  might 
be  an  L  machine  project  that  is  used  to  hold  all  the  design 
components  of  the  L  machine.  Several  different  versions  of 


the  L  machine  might  be  designed,  so  there  might  be  TTL. 
CMOS  and  ECL  sub-projects  of  L  machine.  Within  the 
CMOS  project  there  might  different  projects  to  contain  the 
design  of  the  datapath,  control  logic,  and  memory  manage¬ 
ment  system.  The  modules  of  each  of  these  projects  would 
be  combined  by  the  main  module  contained  in  CMOS  to 
produce  the  final  chip. 

Each  designer  maintains  his  or  her  own  hierarchy  of 
projects.  The  root  of  this  hierarchy  is  called  a  portfolio.  By 
having  sub-projects  point  to  the  same  save  file,  designers 
can  share  projects.  This  project/ module  hierarchy  is  a  very 
useful  way  of  organizing  and  managing  the  material  related 
to  a  design. 

Environments 

By  specifying  an  environment  the  designer  makes  pre¬ 
cise  what  types  of  modules  and  tools  should  be  available 
for  the  design.  Each  environment  consists  of  a  collection 
of  primitive  modules  that  may  be  used,  command  dispatch 
tables  for  the  description  editors,  design  rules,  simulation 
models  and  so  on.  The  environments  themselves  are  orga¬ 
nized  at  a  directed  acyclic  graph.  At  any  time,  the  designer 
can  refine  the  environment  being  used.  For  instance,  one 
could  begin  a  design  in  the  Basic  Logic  environment  and 
later  when  it  had  been  decided  to  use  CMOS,  switch  to 
the  Generic  CMOS  environment.  Finally,  when  a  foundry 
had  been  chosen,  the  designer  would  select  an  evironment 

for  the  specific  process  to  be  used.  While  the  environment 
was  Basic  Logic,  the  designer  would  be  able  to  draw  logic 
schematics  and  simulation,  but  would  be  unable  to  get  any 
timing  information  (other  than  in  gate  delay  units)  or  do 
any  circuit  design.  After  switching  to  Generic  CMOS,  tran¬ 
sistor  level  circuits  and  sticks  diagrams  could  be  developed. 
When  the  process  specific  environment  has  been  chosen, 
detailed  masks  could  be  designed  and  accurate  timing  in¬ 
formation  would  be  available. 
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Software  Tools 


Though  moat  of  the  time  the  data  structures  needed 
by  a  CAD  designer  are  already  in  place,  SCHEMA  also  in¬ 
cludes  a  large  library  of  compatible  flavors  (abstract  data 
types)  for  constructing  new  structures.  Within  this  library 
are  mechanisms  for  dealing  with  many  different  types  of 
hierarchy,  prototypes,  ’‘creation  on  demand,"  timestamp¬ 
ing,  and  so  on.  When  creating  a  new  data  structure,  the 
designer  merely  picks  the  Savors  that  provide  the  function¬ 
ality  desired  and  includes  his  own  customization.  This  fine 
grained  modularity  has  helped  maintain  a  high  level  of  uni¬ 
formity  within  the  system.  The  modularity  techniques  used 
are  based  on  the  Capsule  ideas*. 

In  addition  there  is  also  a  growing  library  of  useful 
CAD  oriented  procedures  that  may  be  drawn  upon.  Among 
them  are  sparse  matrix  routines,  linear  and  non-linear  equa¬ 
tion  solvers,  a  moderate  size  symbolic  algebra  package, 
topological  traversal  routines,  two  dimensional  spatial  man¬ 
agement  packages  and  so  on.  The  existence  of  these  pack¬ 
ages  has  enabled  CAD  builders  to  build  on  each  others  work 
more  than  in  previous  systems. 

The  totality  of  these  tools,  mechanisms  and  policies 
remove  much  of  the  drudgery  from  CAD  tool  development 
and  encourages  tool  developers  to  proceed  in  a  coopera¬ 
tive,  cummulative  fashion.  For  the  electronic  designer  it 
provides  a  uniform  environment,  with  uniform  access  to  a 
wide  variety  of  different  synthesis  and  analysis  tools. 

A  simple  transient  simulator  was  built  on  this  base 
by  Chris  Terman.  An  example  of  its  use  is  shown  on  the 
preceding  page.  The  results  of  the  simulation  are  left  as 
annotations  on  the  topology  that  the  user  (or  another  pro¬ 
gram)  can  examine.  The  top  left  window  shows  the  circuit 
being  simulated,  the  top  right  one  shows  a  few  selected 
waveforms.  In  the  bottom  window,  the  currents  into  the 
depletion  transitors  are  given. 


The  second  figure  illustrates  the  use  of  the  linear  sys¬ 
tems  analysis  tools.  The  bottom  window  gives  the  exact 

transfer  function  of  the  RLC  circuit  shown  in  the  top  left 
window.  At  the  right,  several  Bode  plots  are  given,  and  a 
pop  up  menu  is  shown  which  gives  the  parameters  of  last 
plot. 

Conclusions 

We  have  given  a  brief  summary  of  the  internal  archi¬ 
tecture  of  SCHEMA  and  shown  a  few  of  its  uses.  Its  novel 
architecture  and  extremeiy  high  degree  of  integration  make 
SCHEMA  easir  to  use  both  by  CAD  tool  designers  and  de¬ 
signers  than  many  other  systems. 
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Step  Response  Bounds  for  Systems  Described  by  M-Matricea.  with  Application 
to  Timing  Analysis  of  Digital  MOS  Circuits* 
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ABSTRACT 


Methods  adapted  from  optimal  control  are  used  to  calculate  simple  closed- 
form  upper  and  lower  bounds  for  the  step  response  of  linear  systems  governed  by 
M-matrices.  For  high  order  systems  these  bounds  can  be  calculated  using  far 
less  computer  time  than  an  "exact"  numerical  solution  requires.  The  technique 
has  proven  to  be  of  real  industrial  significance  in  integrated  circuit  CAD, 
where  it  is  used  for  rapid  calculation  of  signal  propagation  delay  through  the 
myriad  of  interconnect  paths  in  digital  MOS  IC's. 
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Methods  adapted  from  optimal  control  are  used  to 
calculate  simple  closed-form  upper  and  lower  bounds  for 
the  step  response  of  linear  systems  governed  by 
M-matnces.  For  high  order  systems  these  bounds  can  be 
calculated  using  far  less  computer  time  them  an  "exact" 
numerical  solution  requires.  The  technique  has  proven 
to  be  of  real  industrial  significance  in  integrated 
circuit  CAD,  where  it  is  used  for  rapid  calculation  of 
signal  propagation  delay  through  the  myriad  of  inter¬ 
connect  paths  in  digital  MOS  IC’s. 

I.  Introduction 


Simple  closed-form  expressions  in  terms  of  the  entries 
of  M  and  b_will  be  derived  for  lower  and  upper  bounds 

x.  (t)  and  x.!t)  such  that 
-i  i 

-i(t>  -  xi(t>  I  xi(t>  .  Vt  >_  0  ,  '  3) 

when  u(-)  is  a  unit  step  at  t-0. 

We  note  that  the  eigenvalues  of  any  M-matrix  -  e 
strictly  positive  real  parts  [6,8],  so  (1)  is 
necessarily  stable.  And  it  turns  out  that  the  step 
response  of  (1)  is  necessarily  monotone  nondecreasing, 
although  the  eigenvalues  of  M  need  not  be  real  in  general. 

II.  State  Equations  in  the  Form  Required 


Linear  systems  of  differential  equations  with 
dynamics  governed  by  M-matrices  are  commonly  used  to 
describe  certain  diffusion-type  systems  arising  in 
cr.emical  engineering  and  biophysics  as  well  as  a  certain 
class  of  electrical  RC  circuits.  Numerically  calculating 
the  step  response  of  high  order  M-matrix  systems  can 
consume  large  amounts  of  computer  time:  this  is  an  urgent 
practical  problem  in  the  timing  analysis  of  digital  MCS 
integrated  circuits.  Driven  by  this  application,  a 
special  technique  has  been  developed  for  rapidly  cal¬ 
culating  bounds  on  the  step  response  of  a  class  of 
linear  electrical  networks  known  as  RC  trees  [11,  and 
the  results  have  found  widespread  practical  use,  e.g., 
[2-41.  This  paper  presents  a  generalization  that 
applies  to  a  larger  class  of  dynamical  systems  governed 
by  arbitrary  M-matnces. 

Many  equivalent  characterizations  of  M-matrices  can 
oe  found  in  the  literature,  e.g.,  [5-111.  We  select  the 
following  or.e  as  a  definition  because  it  relates  directly 
to  certain  manipulations  m  this  paper,  and  we  restrict 
attention  to  the  nonsmgular  case. 


Let  A  be  a  nonsingular  square  matrix  of  real  numbers 
i  -1 

with  B  »  A  .  Then  A  is  said  to  be  an  M-matrix  if 


»j)t  I  7,  Vj  f  k  and  bjk  l  0,  Vj,k  [51. 


The  step  response  bounds  will  involve  the  elements 
A  -1 

of  P  “  M  Multiplying  (1)  by  P  and  rearranging  gives 

Pbu-x  =  Px.  (4) 

Note  that  for  a  rn.it  step  input,  lim  x(t)  »  P  b.  Thus 

.  c  ~™ 

we  define  x  ■  p  b  and  obtain  the  system  description 

x  -  x ( t )  »  P  x ( t )  ,  Vt  >  0  (5) 

-eq  -  *  - 


x!0)  =  3 


Note  that  x  >  0  as  a  conseouence  of  (1)  and  Def.  1. 

In  fact  x  >0  since  P  is  nonsingular  and  b  #  0. 

-eq  -  ’  - 

If  the  state  equations  are  originally  given  in  the 
form  (1),  then  obtaining  p  requires  a  time-consuming 
matrix  inversion  and  obtaining  jt  requires  multiplying 

a  matrix  by  a  vector.  But  we  show  in  Section  VI  that 
these  computations  can  be  avoided  in  a  significant  class 
of  problems  of  practical  interest,  for  which  the  entries 
of  P  and  x  can  be  obtained  directly  by  inspection  of 


the  physical  system  model. 


The  symbol  M  will  denote  any  M-matrix,  and  P  will 

denote  M  ^ .  For  two  vectors  a,  b  £  1*.  Sit?  means 
a.  >  b,,  j«l, .  .  .  ,  m,  and  a  »  b  means  a  >  b  and  s  **  fe- 

■  •*  N 

In  the  dynamical  system  description  (1)  below,  x  e  IR  , 
M  and  p  are  of  dimensions  compatible  with  x,  and  the 
subscript  i  is  reserved  to  denote  the  distinguished 
component  of  $  selected  as  the  output. 

This  paper  presents  a  computationally  fast  method 
for  bounding  the  zero-state  step  response  of  high  order 
single-input  single-output  linear  systems  of  the  form 

x  »  -M  x  *  b  u,  b  >  0  (1) 


III.  Useful  Inequalities  for  M-Hatnx  Systems 


The  zero-state  step  response  of  (1)  is  monotone  in 
time,  i . e .  , 


x(t)  >  3  ,  Vt  >  3  . 


NxN 

If  P  e  3R’  is  the  inverse  of  an  M-matrix,  then 


pupki  ipkipi,  -  £  ,;i- 


where  i,  j  and  k  are  not  necessarily  distinct. 


Facts  1  and  2  are  proved  in  the  Appendix. 

Cemma  1 

-et  xi  =  y  be  the  distinguished  output  variable  of 

' 5—  7 )  .  Then  at  any  instant  during  the  transient  the 
relation  between  x^(t)  and  every  other  state  variable 

x^(t)  is  governed  by  the  inequalities 

pU(xk  ’  *k(t))  -pki<xi  ‘  xi(t,)  <10> 

eq  eq 

pik(xk  •  Vtn  -pkk<xi  -  xi(t)>  •  <u> 

eq  eq 

Vi,k  t.  (1,  . . . ,  N>,  Vt  >  0. 

Proof 

Expanding  the  k-th  and  i-th  rows  of  (5)  gives 
N 


I 

j-1 

Pk  X. (t) 
*3  3 

N 

r 

P  .  .  x . (t) 

j-i 

i3  3 

Xi  "  pi3Xilt)  ' 

eq  3  =  1 

Multiply  (12)  by  p. .  and  (13)  by  p  and  subtract. 


pii(xk  •  xk(t”  “  pki(xi  -xi(t)) 
eq  eq 


$  (piipki-  PkiPi:1  xj(t)  i°-  (14i 

3=1 


Fig.  1  The  variable  f  (t)  is  the  “area  to  go”  under 
the  normalized  output  g^(t). 

To  construct  a  reduced  order  model  of  the  system  we 
introduce  an  artificial  input  w(t)  to  represent  the 

unknown  waveform  4. (t) . 

i 

Reduced  Order  Model 

defining  g^  :  w,  we  assemble  (15)  and  (16)  into 
the  reduced  order  model 


*'ii  r  °  i 

J  +  [xJ  w' 


[  gi  j  i9ij  LiJ 

with  the  constraints  and  initial  conditions  below. 
Input  Constraints 


where  the  last  inequality  follows  from  (8)  and  (9). 

This  proves  (10) ,  and  (11)  follows  by  interchanging  the 
roles  of  k  and  i. 


IV.  Construction  of  a  Reduced  Order  Model  by  Intro¬ 
ducing  a  Mew  State  Variable 


Assume  >0,  since  otherwise  the  step  response 

eq 

at  the  output  is  identically  zero.  The  goal  of  this 
section  is  to  construct  a  simple  second  order  model  of 
the  higher  order  system  (5) -(7)  in  terms  of  the 
variables  f ^  and  g^,  where  g^  is  a  normalized  version 


of  the  output, 


x.  -  x. (t) 
A  *eq  1 


and  f i  is  the  new  state  variable 

XS 

f.(t)  «  |  g.(r)  dx 

as  shown  in  Fig.  1. 


From  (8),  (15)  and  the  fact  that  xi  >  0,  we 
conclude  that  eq 

w(t)  <_  0  ,  Vt  >_  0  . 

Initial  Conditions 

From  (15)  , 


g.  (0)  -  1 
i 


From  (13)  and  (16), 


g.(t>  dr 


1  i  )  p.  x.  (r)  dr 

~  i  k-l  *  k 
eq  0 


1  p 

J,  Pi*\ 


i  k»l 

eq 


'  td 

eq  i 


The  time  constant  Tq  is  frequently  called  the 

"Elmore  delay”  [12]  in  the  literature,  and  the  reader 
can  easily  verify  that  it  equals  the  first  moment  of 
the  normalised  impulse  response  Srltt/x,^  »  (t) .  For 

this  reason  it  is  used  in  some  applications  ae  an  estimate 
of  signal  propagation  delay  through  the  system  113,  14). 


Vsing  the  notation  in  (1) .  define  the  additional 
time  constants 


tp  *  k-l  pkk-  tr 
A  1  - 

TP  *  ; —  :  PikPkl 
i  ru  k-l  lk  x 


2 


In  fact,  these  in- 


Note  that  T 


R.  - 

l 


*  D .  —  T?‘ 
1 


equalities  hold  terra-by-term  for  the  sums  in  <2D!-(22) 
defining  the  time  constants,  as  the  reader  can  verify 
by  evaluating  (ID)  and  (11)  at  t«G. 


Lemma  2 


The  solution  to  (5) -(7)  is  such  that  the  state 
variables  f  and  g^^  of  (17)  satisfy  the 

State  Constraints 


Proof 


Tr  gi(t)  1  f. (t)  <  Tpgi(t) 


Vt  >  0. 


(23) 


From  (13),  (15)  and  (16) 
1 


f.  (t) 

i 


N 

T 

xi  k-1 
eg 


i,  ?ik(xk  *  xk(t))  '  <24> 

eq 


Using  (10)  in  (24)  yields  f  >_  T  9,>  and  using  (11)  in 
(24)  yields  f  ^  *'p9i'  1  "i 


Equations  (17)  -(23)  define  the  reduced-order  description 
of  the  system  (5)- (7)  and  can  be  analysed  with  a 
minimum  of  computation.  See  Fig.  2.  The  cost  of  this 
simplification  is  the  introduction  of  uncertainty,  as 
represented  by  the  inequalities  (18)  and  (23) .  Thus 
we  obtain  only  bounds  on  the  true  output  rather  than 
an  exact  solution. 


typical  trajectory 
estimate  trajectory 
state  constraints 


state  variable  g^  (t)  in  (l')  can  reach  any  given 

"target"  value  g*  (D,  11,  subject  to  the  constraints 

(18) - ( 23 ) .  This  is  a  linear  minimum-  and  maximum-time 
optimal  control  problem  with  a  fixed  initial  condition, 
inequality  constraints  on  the  state  and  input,  and 

• 

terminal  condition  g^  •  gi .  The  optimal  trajectories 

can  be  determined  without  recourse  to  Pontryagin's 
maximum  principle  because  the  time  duration  t  of  any 
path  in  the  f  ^-g^  plane  can  be  calculated  by  rearranging 

and  integrating  df  Vdt  •  -g^  to  yield 

f  f initial  (25) 


i 


f final 

Thus  the  fastest  trajectory  from  the  initial  state 

to  the  "target"  interval  (T_  g*  <  f.  <  T_g*,  g.  »  g*) 

R.  l  —  i  —  p^i  ^i  "i 

is  the  one  for  which  both  the  width  if,  •  f  .  - 

i  initial 

ffinal  of  ti'e  re9lon  of  integration  and  the  integrand 
1/g^  are  minimized.  The  slowest  trajectory  is  found 

similarly,  and  both  are  displayed  in  Figs.  3-5.  The 

maximum  and  minimum  times,  denoted  t  ( g* )  and 

max  l 

t^n (^i )  because  they  depend  on  the  "target"  value  g*, 

can  be  calculated  directly  from  the  figure  using  (25) . 
The  reader  will  be  spared  the  algebraic  details,  which 
he  can  easily  reproduce  if  needed:  the  resulting 
expressions  have  appeared  in  [1)  for  a  slightly  special 
case  and  can  be  found  in  [15)  for  the  general  case. 


Fig.  2  The  step  response  of  a  typical  high  order  system 
of  the  form  (1-2)  has  the  general  appearance 
shown  above  when  plotted  in  the  f^-g^  plane. 

To  save  computer  time  we  avoid  calculating  the 
exact  trajectory  and  use  only  a)  the  initial 
conditions  (19)  and  (20),  b)  the  state 
constraints  (23)  as  drawn  above,  and  c)  the 
knowledge  that  the  trajectory  must  move  down¬ 
ward  because  of  the  input  constraint  (18)  for 
the  reduced  order  model,  and  to  the  left  because 
fi  ■  -gi  <_  0.  The  "estimate  trajectory"  is 

simple  in  form  and  satisfies  all  these 
conditions . 

V.  Optimal  Control  Method  for  Determining  Step 
Response  Bounds 

To  calculate  bounds  on  the  step  response,  we  first 
determine  the  maximum  and  minimum  times  at  which  the 


ie|  ^  ■  slowest 
<<<<  awH.  fastest 


Fig.  3  Form  of  the  maximum-  and  minimum-time 

trajectories  for  the  case  T_  /T_  <  g*  <  1.  The 

’  i  — 

common  initial  state  is  marked  with  a  dot,  the 
"target"  set  is  the  portion  of  the  horizonal 
line  at  9i"gJ  lying  within  the  state  constraints 

and  the  terminal  states  for  the  two  trajectories 
are  marked  with  small  circles.  The  minimum¬ 
time  trajectory  drops  vertically  to  the  target 
in  zero  time.  The  maximum-time  trajectory  drops 
immediately  to  a  point  infinitesimally  above  the 
target  and  then  moves  horizontally  to  the  left 
at  constant  velocity. 


slowest 


«+«<<<<  fastest 


Fig.  1  Form  of  the  maximum-  and  minimum  time  trajec¬ 
tories  for  the  case  TR  /Tp  '  9*  <  TD/”rp-  Th* 

fastest  trajectory  first  moves  horizontally  at 
constant  velocity  and  then  drops  vertically  to 
the  right  hand  edge  of  the  target.  The  slowest 
trajectory  first  drops  vertically  to  the  lower 
constraint  boundary,  proceeds  along  that  boundary 
until  it  is  infinitesimally  above  the  target,  then 
moves  horizontally  to  the  left  at  constant  velocity. 


in  turn  yield  lower  and  upper  bounds  >t(t)  and  x.  (t) 

on  the  step  response  x^lt),  x. (t)  *  x.  (1  -  g.(t))  < 

1  1eq1- 

xi(t)  <  Xi(t'  -  x.  (1  -  <t) ) ,  Vt  >  0.  The  final 

eq 

results  are  given  below  and  their  form  is  plotted  in 

Fig.  6. 

Lower  Bound 


Fig.  i  Form  of  -he  extremal  trajectories  for 

0  <  gt  <  Tr  /Tp.  The  slowest  trajectory  remains 
i 

similar  to  Fig.  4,  but  the  fastest  trajectory 
now  encounters  the  upper  constraint  boundary 
and  proceeds  along  it  before  dropping  vertically 
and  instantaneously  to  the  right  hand  edge  of 
the  target. 

Fortunately,  both  t  (g*)  and  t  (g*>  can  be 

"lii  i  ntix  1 

inverted  explicitly  to  yield  upper  and  lower  bounds 

5i(tl  *  ‘"max''’  and  *i(t  ’  ^iin'-’  Cor  Ve)  ’  The*« 


Fig.  6  Form  of  the  step  response  bounds  (26)  and  (27). 
The  distance  between  them  is  exaggerated  for 
clarity. 


single  Time  Constant  Estimate  for  the  Step 


Tne  simple  time  function 

.  -t/T„ 

x. (t)  -  x.  (1  -  e  % 
eq 


Vt  >  0  , 


generates  a  straight  line  from  the  initial  condition  to 
the  origin  when  plotted  in  the  f^-g.^  plane,  as  shown  in 

Fig.  2.  When  plotted  against  t,  it  shares  the  following 
features  with  the  exact  step  response  xi<t):  both  have 

value  zero  at  t*0  and  rise  smoothly  and  monotonically  to 


eq 

and  x. (t!/x. 


and  the  first  moment  of  both  x  (t!/x. 

1  1 


For  this  reason  x. (•!  is 


frequently  used  as  an  estimate  of  the  exact  step. response 


waveform  x^(-)  (13,14).  Note  from  Fig.  2  that  x ^ ( - )  is 


always  a  feasible  (but  not  generally  optimal)  solution 
to  the  constrained  optimal  control  problem  and  hence 
must  always  lie  between  the  step  response  bounds  (26) 
and  (27). 

The  estimate  x. (•)  is  an  attempt  to  approximate  the 
step  response  of  a  high  order  system  by  that  of  one  with 
a  single  time  constant:  the  bounds  measure  the  worst 
case  error  resulting  from  that  approximation.  Whenever 


*r  -  T 

*P  R. 
1 


<<  Tp  the  wedge-shaped  feasible  region  in 


Figs.  2-5  becomes  narrow,  the  maximum-  and  minimum¬ 
time  trajectories  lie  close  together,  and  the  bounds 
become  very  tight:  this  is  commonly  the  case  in  the 
application  below.  The  interested  reader  may  wish  to 
verify  the  following  relation  between  the  time 
constants  and  the  tightness  of  the  bounds : 


r*P  ", 


sup  ( x  (t)  -  x.  (t)  I  <  x.  ■  - - 1 

t_>0  *  1  xeq  [_  *S. 


VI.  Application  to  VLSI  CAD 


This  paper  expands  and  generalizes  previous 
research  (1)  focussed  on  the  task  of  estimating  signal 
propagation  delay  in  branching  interconnect  lines  on 
MOS  VLSI  chips.  The  earlier  theory  relied  on  special 
features  of  interconnect  networks  not  shared  by  the 
general  M-matnx  systems  discussed  here.  This  section 
is  confined  to  a  brief  description  of  the  application: 
full  details  are  given  in  (1). 

A  reasonable  electrical  model  for  interconnect 
paths  on  chips  is  a  linear,  nonuniform  ”RC  tree"  net¬ 
work  consisting  of  a  tree  of  linear  resistors  driven 
by  an  ideal  voltage  source  and  shunted  by  capacitors 
to  ground,  as  illustrated  in  Fig.  7.  As  many  as  100,000 
of  these  interconnect  nets  can  be  found  on  a  single 
cnip,  and  they  ccme  in  a  great  variety  of  sizes  and 
configurations.  The  propagation  delay  from  the  source 
to  any  node  functioning  as  an  output  is  determined 
from  the  step  response  at  that  node  in  conjunction 
with  knowledge  of  t.-.e  switching  voltage  threshold  of 
the  logic  gate  driven  by  that  output.  The  bounds  have 
been  found  useful  in  practice  (2-4)  because  they  save 
computer  time:  exact  numerical  calculation  of  the  step 
response  of  such  a  vast  number  of  systems  of  moderately 
high  order  is  out  of  the  question. 


I  I 


I  I 


Lumped,  linear  RC  tree  networks  are  useful 
models  for  branched  transmission  lines  on  MCS 
chips:  their  dynamics  are  described  by 
M-  matrices. 


It  is  not  difficult  to  show  (16)  that'  the  dynamics 
of  such  networks  are  gover  -  by  equations  of  the 
form  (1),  (2):  of  greater  rest  here  is  the  fact 


that  the  entries  of  P  *  M  ^  and  of  x  in  (5)  can  be 

-eq 

found  by  inspection,  '.'sing  the  capacitor  voltages  as 
state  variables ,  we  see  that  no  current  flows  in  the 
resistors  at  equilibrium,  so  the  equilibrium  voltages 
are  all  1  v.  for  a  unit  step  input.  And  it  is  easy  to 
show  Ill ,  (16)  that  the  state  equations  for  the  step 
response  of  any  N-capacitor  RC  tree  can  be  written  m 
the  form 


(1  -  v)  »  [RC)  v 


where  v  e  is  the  vector  of  capacitor  voltages, 


1  E  IR^  is  a  vector  of  l's,  C  c  IRNxN  is  a  diagonal 


matrix  of  the  capacitor  values,  and  R  c  S  is 


obtained  by  setting  R  ^  equal  to  the  sum  of  the 


resistances  along  the  path  obtained  by  intersecting  the 


route  from  C.  to  the  source  with  the  route  from  C.  to 
3  * 


the  source.  Thus  P  «  (RC)  is  obtained  without  matrix 


inversion. 


Additional  Results 


The  alert  reader  may  have  noticed  that  the  extremal 
trajectories  in  Figs.  3-5  contain  unrealistic  vertical 
segments  representing  discontinuous  jumps  in  g^.  A 


practical  method  of  tightening  the  bounds  by  incor¬ 
porating  finite  slew  rate  limits  was  briefly  described 
in  (17)  along  with  a  second  bound-tightening  technique 
applicable  only  to  RC  trees. 

A  way  to  apply  the  results  in  this  paper  to  an 
important  class  of  nonlinear  RC  circuits  is  given  in 
(18). 
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Proof  of  Fact  1 


Consider  (1)  with  a  unit  step  input  at  t»0. 
Differentiating  with  respect  to  t  yields  the  equation 
governing  the  evolution  of  x. 


x  -  -Mx  ,  Vt  >  0 


x (0  )  »  b  >  0  . 


Because  the  off-diagonal  elements  of  -M  are  nonnegative, 
the  closed  first  orthant  of  3!  ,  usually  denoted  3R^  , 
is  positive  invariant  under  the  flow  of  (A.l)  (19). 


Thus  x (0  >  >  0  "►xlt)  >  3,  Vt  >  0. 


Proof  of  Fact 


If  i»j  or  i»k,  (9)  is  trivial.  If  j»k,  (9) 
becomes  FliP]tlt  ls  true  because  the 

determinant  of  any  principal  submatrix  of  an  inverse 
M-matrix  is  known  to  be  positive  ill,  Cor.  1,  p.  198). 
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If  i,  j  and  k  are  all  distinct,  consider  the  3x3 
principal  submatrix  of  ?  consisting  of  elements  that 
lie  both  in  row  i,  j  or  k  and  in  column  i,  j  or  k,  and 
denote  it  P.  Since  any  principal  submatrix  of  an  inverse 
M-matrix  is  known  to  be  an  inverse  M-matrix  [5,  p.  329], 

P  is  "inverse  M"  and  hence  (P  ^).  .  <_  0.  Taking  the 
inverse  of  ?  using  cofactors,  we  have  0  >  (P  *).  .  » 
cof  p^/det  P,  and  det  P  >  :  [11,  p.198).  Thus  cof 
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